Search | arXiv e-print repository

PuTR: A Pure Transformer for Decoupled and Online Multi-Object Tracking

Authors: Chongwei Liu, Haojie Li, Zhihui Wang, Rui Xu

Abstract: Recent advances in Multi-Object Tracking (MOT) have achieved remarkable success in short-term association within the decoupled tracking-by-detection online paradigm. However, long-term tracking still remains a challenging task. Although graph-based approaches can address this issue by modeling trajectories as a graph in the decoupled manner, their non-online nature poses obstacles for real-time ap… ▽ More Recent advances in Multi-Object Tracking (MOT) have achieved remarkable success in short-term association within the decoupled tracking-by-detection online paradigm. However, long-term tracking still remains a challenging task. Although graph-based approaches can address this issue by modeling trajectories as a graph in the decoupled manner, their non-online nature poses obstacles for real-time applications. In this paper, we demonstrate that the trajectory graph is a directed acyclic graph, which can be represented by an object sequence arranged by frame and a binary adjacency matrix. It is a coincidence that the binary matrix matches the attention mask in the Transformer, and the object sequence serves exactly as a natural input sequence. Intuitively, we propose that a pure Transformer can naturally unify short- and long-term associations in a decoupled and online manner. Our experiments show that a classic Transformer architecture naturally suits the association problem and achieves a strong baseline compared to existing foundational methods across four datasets: DanceTrack, SportsMOT, MOT17, and MOT20, as well as superior generalizability in domain shift. Moreover, the decoupled property also enables efficient training and inference. This work pioneers a promising Transformer-based approach for the MOT task, and provides code to facilitate further research. https://github.com/chongweiliu/PuTR △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.13745 [pdf, other]

NeurCross: A Self-Supervised Neural Approach for Representing Cross Fields in Quad Mesh Generation

Authors: Qiujie Dong, Huibiao Wen, Rui Xu, Xiaokang Yu, Jiaran Zhou, Shuangmin Chen, Shiqing Xin, Changhe Tu, Wen** Wang

Abstract: Quadrilateral mesh generation plays a crucial role in numerical simulations within Computer-Aided Design and Engineering (CAD/E). The quality of the cross field is essential for generating a quadrilateral mesh. In this paper, we propose a self-supervised neural representation of the cross field, named NeurCross, comprising two modules: one to fit the signed distance function (SDF) and another to p… ▽ More Quadrilateral mesh generation plays a crucial role in numerical simulations within Computer-Aided Design and Engineering (CAD/E). The quality of the cross field is essential for generating a quadrilateral mesh. In this paper, we propose a self-supervised neural representation of the cross field, named NeurCross, comprising two modules: one to fit the signed distance function (SDF) and another to predict the cross field. Unlike most existing approaches that operate directly on the given polygonal surface, NeurCross takes the SDF as a bridge to allow for SDF overfitting and the prediction of the cross field to proceed simultaneously. By utilizing a neural SDF, we achieve a smooth representation of the base surface, minimizing the impact of piecewise planar discretization and minor surface variations. Moreover, the principal curvatures and directions are fully encoded by the Hessian of the SDF, enabling the regularization of the overall cross field through minor adjustments to the SDF. Compared to state-of-the-art methods, NeurCross significantly improves the placement of singular points and the approximation accuracy between the input triangular surface and the output quad mesh, as demonstrated in the teaser figure. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.13729 [pdf, other]

ComboStoc: Combinatorial Stochasticity for Diffusion Generative Models

Authors: Rui Xu, Jiepeng Wang, Hao Pan, Yang Liu, Xin Tong, Shiqing Xin, Changhe Tu, Taku Komura, Wen** Wang

Abstract: In this paper, we study an under-explored but important factor of diffusion generative models, i.e., the combinatorial complexity. Data samples are generally high-dimensional, and for various structured generation tasks, there are additional attributes which are combined to associate with data samples. We show that the space spanned by the combination of dimensions and attributes is insufficiently… ▽ More In this paper, we study an under-explored but important factor of diffusion generative models, i.e., the combinatorial complexity. Data samples are generally high-dimensional, and for various structured generation tasks, there are additional attributes which are combined to associate with data samples. We show that the space spanned by the combination of dimensions and attributes is insufficiently sampled by existing training scheme of diffusion generative models, causing degraded test time performance. We present a simple fix to this problem by constructing stochastic processes that fully exploit the combinatorial structures, hence the name ComboStoc. Using this simple strategy, we show that network training is significantly accelerated across diverse data modalities, including images and 3D structured shapes. Moreover, ComboStoc enables a new way of test time generation which uses insynchronized time steps for different dimensions and attributes, thus allowing for varying degrees of control over them. △ Less

Submitted 24 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.12369 [pdf, other]

AtomGS: Atomizing Gaussian Splatting for High-Fidelity Radiance Field

Authors: Rong Liu, Rui Xu, Yue Hu, Meida Chen, Andrew Feng

Abstract: 3D Gaussian Splatting (3DGS) has recently advanced radiance field reconstruction by offering superior capabilities for novel view synthesis and real-time rendering speed. However, its strategy of blending optimization and adaptive density control might lead to sub-optimal results; it can sometimes yield noisy geometry and blurry artifacts due to prioritizing optimizing large Gaussians at the cost… ▽ More 3D Gaussian Splatting (3DGS) has recently advanced radiance field reconstruction by offering superior capabilities for novel view synthesis and real-time rendering speed. However, its strategy of blending optimization and adaptive density control might lead to sub-optimal results; it can sometimes yield noisy geometry and blurry artifacts due to prioritizing optimizing large Gaussians at the cost of adequately densifying smaller ones. To address this, we introduce AtomGS, consisting of Atomized Proliferation and Geometry-Guided Optimization. The Atomized Proliferation constrains ellipsoid Gaussians of various sizes into more uniform-sized Atom Gaussians. The strategy enhances the representation of areas with fine features by placing greater emphasis on densification in accordance with scene details. In addition, we proposed a Geometry-Guided Optimization approach that incorporates an Edge-Aware Normal Loss. This optimization method effectively smooths flat surfaces while preserving intricate details. Our evaluation shows that AtomGS outperforms existing state-of-the-art methods in rendering quality. Additionally, it achieves competitive accuracy in geometry reconstruction and offers a significant improvement in training speed over other SDF-based methods. More interactive demos can be found in our website (https://rongliu-leo.github.io/AtomGS/). △ Less

Submitted 22 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

arXiv:2405.11826 [pdf, other]

Data quality control system and long-term performance monitor of the LHAASO-KM2A

Authors: Zhen Cao, F. Aharonian, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, W. Bian, A. V. Bukevich, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, H. X. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. Chen , et al. (263 additional authors not shown)

Abstract: The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To… ▽ More The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To ensure the reliability of the LHAASO-KM2A data, a three-level quality control system has been established. It is used to monitor the status of detector units, stability of reconstructed parameters and the performance of the array based on observations of the Crab Nebula and Moon shadow. This paper will introduce the control system and its application on the LHAASO-KM2A data collected from August 2021 to July 2023. During this period, the pointing and angular resolution of the array were stable. From the observations of the Moon shadow and Crab Nebula, the results achieved using the two methods are consistent with each other. According to the observation of the Crab Nebula at energies from 25 TeV to 100 TeV, the time averaged pointing errors are estimated to be $-0.003^{\circ} \pm 0.005^{\circ}$ and $0.001^{\circ} \pm 0.006^{\circ}$ in the R.A. and Dec directions, respectively. △ Less

Submitted 13 June, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

Comments: 15 pages, 9 figures

arXiv:2405.11083 [pdf, other]

Prompt Exploration with Prompt Regression

Authors: Michael Feffer, Ronald Xu, Yuekai Sun, Mikhail Yurochkin

Abstract: In the advent of democratized usage of large language models (LLMs), there is a growing desire to systematize LLM prompt creation and selection processes beyond iterative trial-and-error. Prior works majorly focus on searching the space of prompts without accounting for relations between prompt variations. Here we propose a framework, Prompt Exploration with Prompt Regression (PEPR), to predict… ▽ More In the advent of democratized usage of large language models (LLMs), there is a growing desire to systematize LLM prompt creation and selection processes beyond iterative trial-and-error. Prior works majorly focus on searching the space of prompts without accounting for relations between prompt variations. Here we propose a framework, Prompt Exploration with Prompt Regression (PEPR), to predict the effect of prompt combinations given results for individual prompt elements as well as a simple method to select an effective prompt for a given use-case. We evaluate our approach with open-source LLMs of different sizes on several different tasks. △ Less

Submitted 17 May, 2024; originally announced May 2024.

arXiv:2405.10370 [pdf, other]

Grounded 3D-LLM with Referent Tokens

Authors: Yilun Chen, Shuai Yang, Haifeng Huang, Tai Wang, Ruiyuan Lyu, Runsen Xu, Dahua Lin, Jiangmiao Pang

Abstract: Prior studies on 3D scene understanding have primarily developed specialized models for specific tasks or required task-specific fine-tuning. In this study, we propose Grounded 3D-LLM, which explores the potential of 3D large multi-modal models (3D LMMs) to consolidate various 3D vision tasks within a unified generative framework. The model uses scene referent tokens as special noun phrases to ref… ▽ More Prior studies on 3D scene understanding have primarily developed specialized models for specific tasks or required task-specific fine-tuning. In this study, we propose Grounded 3D-LLM, which explores the potential of 3D large multi-modal models (3D LMMs) to consolidate various 3D vision tasks within a unified generative framework. The model uses scene referent tokens as special noun phrases to reference 3D scenes, enabling the handling of sequences that interleave 3D and textual data. It offers a natural approach for translating 3D vision tasks into language formats using task-specific instruction templates. To facilitate the use of referent tokens in subsequent language modeling, we have curated large-scale grounded language datasets that offer finer scene-text correspondence at the phrase level by bootstrap** existing object labels. Subsequently, we introduced Contrastive LAnguage-Scene Pre-training (CLASP) to effectively leverage this data, thereby integrating 3D vision with language models. Our comprehensive evaluation covers open-ended tasks like dense captioning and 3D QA, alongside close-ended tasks such as object detection and language grounding. Experiments across multiple 3D benchmarks reveal the leading performance and the broad applicability of Grounded 3D-LLM. Code and datasets will be released on the project page: https://groundedscenellm.github.io/grounded_3d-llm.github.io. △ Less

Submitted 16 May, 2024; originally announced May 2024.

Comments: Preprint

arXiv:2405.09225 [pdf, other]

Exploring Ground States of Fermi-Hubbard Model on Honeycomb Lattices with Counterdiabaticity

Authors: Jialiang Tang, Ruoqian Xu, Yongcheng Ding, Xusheng Xu, Yue Ban, Manhong Yung, Axel Pérez-Obiol, Gloria Platero, Xi Chen

Abstract: Exploring the ground state properties of many-body quantum systems conventionally involves adiabatic processes, alongside exact diagonalization, in the context of quantum annealing or adiabatic quantum computation. Shortcuts to adiabaticity by counter-diabatic driving serve to accelerate these processes by suppressing energy excitations. Motivated by this, we develop variational quantum algorithms… ▽ More Exploring the ground state properties of many-body quantum systems conventionally involves adiabatic processes, alongside exact diagonalization, in the context of quantum annealing or adiabatic quantum computation. Shortcuts to adiabaticity by counter-diabatic driving serve to accelerate these processes by suppressing energy excitations. Motivated by this, we develop variational quantum algorithms incorporating the auxiliary counterdiabatic interactions, comparing them with digitized adiabatic algorithms. These algorithms are then implemented on gate-based quantum circuits to explore the ground states of the Fermi-Hubbard model on honeycomb lattices, utilizing systems with up to 26 qubits. The comparison reveals that the counter-diabatic inspired ansatz is superior to traditional Hamiltonian variational ansatz. Furthermore, the number and duration of Trotter steps are analyzed to understand and mitigate errors. Given the model's relevance to materials in condensed matter, our study paves the way for using variational quantum algorithms with counterdiabaticity to explore quantum materials in the noisy intermediate-scale quantum era. △ Less

Submitted 15 May, 2024; originally announced May 2024.

Comments: 11 pages, 6 figures

arXiv:2405.07691 [pdf, other]

Discovery of Very-high-energy Gamma-ray Emissions from the Low Luminosity AGN NGC 4278 by LHAASO

Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (255 additional authors not shown)

Abstract: The first source catalog of Large High Altitude Air Shower Observatory reported the detection of a very-high-energy gamma ray source, 1LHAASO J1219+2915. In this paper a further detailed study of the spectral and temporal behavior of this point-like source have been carried. The best-fit position of the TeV source ($\rm{RA}=185.05^{\circ}\pm0.04^{\circ}$, $\rm{Dec}=29.25^{\circ}\pm0.03^{\circ}$) i… ▽ More The first source catalog of Large High Altitude Air Shower Observatory reported the detection of a very-high-energy gamma ray source, 1LHAASO J1219+2915. In this paper a further detailed study of the spectral and temporal behavior of this point-like source have been carried. The best-fit position of the TeV source ($\rm{RA}=185.05^{\circ}\pm0.04^{\circ}$, $\rm{Dec}=29.25^{\circ}\pm0.03^{\circ}$) is compatible with NGC 4278 within $\sim0.03$ degree. Variation analysis shows an indication of the variability at a few months level in the TeV band, which is consistent with low frequency observations. Based on these observations, we report the detection of TeV $γ$-ray emissions from this low-luminosity AGN NGC 4278. The observations by LHAASO-WCDA during active period has a significance level of 8.8\,$σ$ with best-fit photon spectral index $\varGamma=2.56\pm0.14$ and a flux $f_{1-10\,\rm{TeV}}=(7.0\pm1.1_{\rm{sta}}\pm0.35_{\rm{syst}})\times10^{-13}\,\rm{photons\,cm^{-2}\,s^{-1}}$, or approximately $5\%$ of the Crab Nebula. The discovery of VHE from NGC 4278 indicates that the compact, weak radio jet can efficiently accelerate particles and emit TeV photons. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: 11 pages, 5 figures

arXiv:2405.07644 [pdf, other]

doi 10.1145/3610548.3618191

A Hessian-Based Field Deformer for Real-Time Topology-Aware Shape Editing

Authors: Yunxiao Zhang, Zixiong Wang, Zihan Zhao, Rui Xu, Shuangmin Chen, Shiqing Xin, Wen** Wang, Changhe Tu

Abstract: Shape manipulation is a central research topic in computer graphics. Topology editing, such as breaking apart connections, joining disconnected ends, and filling/opening a topological hole, is generally more challenging than geometry editing. In this paper, we observe that the saddle points of the signed distance function (SDF) provide useful hints for altering surface topology deliberately. Based… ▽ More Shape manipulation is a central research topic in computer graphics. Topology editing, such as breaking apart connections, joining disconnected ends, and filling/opening a topological hole, is generally more challenging than geometry editing. In this paper, we observe that the saddle points of the signed distance function (SDF) provide useful hints for altering surface topology deliberately. Based on this key observation, we parameterize the SDF into a cubic trivariate tensor-product B-spline function $F$ whose saddle points $\{\boldsymbol{s}_i\}$ can be quickly exhausted based on a subdivision-based root-finding technique coupled with Newton's method. Users can select one of the candidate points, say $\boldsymbol{s}_i$, to edit the topology in real time. In implementation, we add a compactly supported B-spline function rooted at $\boldsymbol{s}_i$, which we call a \textit{deformer} in this paper, to $F$, with its local coordinate system aligning with the three eigenvectors of the Hessian. Combined with ray marching technique, our interactive system operates at 30 FPS. Additionally, our system empowers users to create desired bulges or concavities on the surface. An extensive user study indicates that our system is user-friendly and intuitive to operate. We demonstrate the effectiveness and usefulness of our system in a range of applications, including fixing surface reconstruction errors, artistic work design, 3D medical imaging and simulation, and antiquity restoration. Please refer to the attached video for a demonstration. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: 10 pages, 18 figures

arXiv:2405.07303 [pdf, other]

Search for solar axions by Primakoff effect with the full dataset of the CDEX-1B Experiment

Authors: L. T. Yang, S. K. Liu, Q. Yue, K. J. Kang, Y. J. Li, H. P. An, Greeshma C., J. P. Chang, Y. H. Chen, J. P. Cheng, W. H. Dai, Z. Deng, C. H. Fang, X. P. Geng, H. Gong, Q. J. Guo, T. Guo, X. Y. Guo, L. He, J. R. He, J. W. Hu, H. X. Huang, T. C. Huang, L. Jiang, S. Karmakar , et al. (61 additional authors not shown)

Abstract: We present the first limit on $g_{Aγ}$ coupling constant using the Bragg-Primakoff conversion based on an exposure of 1107.5 kg days of data from the CDEX-1B experiment at the China **** Underground Laboratory. The data are consistent with the null signal hypothesis, and no excess signals are observed. Limits of the coupling $g_{Aγ}<2.08\times10^{-9}$ GeV$^{-1}$ (95\% C.L.) are derived for axio… ▽ More We present the first limit on $g_{Aγ}$ coupling constant using the Bragg-Primakoff conversion based on an exposure of 1107.5 kg days of data from the CDEX-1B experiment at the China **** Underground Laboratory. The data are consistent with the null signal hypothesis, and no excess signals are observed. Limits of the coupling $g_{Aγ}<2.08\times10^{-9}$ GeV$^{-1}$ (95\% C.L.) are derived for axions with mass up to 100 eV/$c^2$. Within the hadronic model of KSVZ, our results exclude axion mass $>5.3~\rm{eV}/c^2$ at 95\% C.L. △ Less

Submitted 12 May, 2024; originally announced May 2024.

Comments: 7 pages, 5 figures

arXiv:2405.07152 [pdf, other]

On the energy budget of starquake-induced repeating fast radio bursts

Authors: Wei-Yang Wang, Chen Zhang, En** Zhou, Xiaohui Liu, Jiarui Niu, Zixuan Zhou, He Gao, Jifeng Liu, Renxin Xu, Bing Zhang

Abstract: With a growing sample of fast radio bursts (FRBs), we investigate the energy budget of different power sources within the framework of magnetar starquake triggering mechanism. During a starquake, the energy can be released in any form through magnetic, strain, rotational, and gravitational energies. Following findings are revealed: 1. The crust can store a free magnetic energy of the amount of at… ▽ More With a growing sample of fast radio bursts (FRBs), we investigate the energy budget of different power sources within the framework of magnetar starquake triggering mechanism. During a starquake, the energy can be released in any form through magnetic, strain, rotational, and gravitational energies. Following findings are revealed: 1. The crust can store a free magnetic energy of the amount of at least $6.3\times10^{46}$ erg via toroidal fields, with frequent starquakes happening due to the instability of the crust. 2. The strain energy develops as a rigid object spins down, which can be released during a global starquake accompanied by a glitch. However, it takes a long time to accumulate enough strain energy via spin-down. 3. The rotational energy of a magnetar with $P\lesssim0.1\rm\,s$ can match the energy and luminosity budget of FRBs. 4. The budget of the total gravitational energy is high, but the mechanism and efficiency of converting this energy to radiation deserve further exploration. △ Less

Submitted 11 May, 2024; originally announced May 2024.

Comments: 10 pages, 2 figures. Submitted. Some intriguing FAST's results are expected!

arXiv:2405.06359 [pdf, other]

Quantum Krylov-Subspace Method Based Linear Solver

Authors: Rui-Bin Xu, Zhu-Jun Zheng, Zheng Zheng

Abstract: Despite the successful enhancement to the Harrow-Hassidim-Lloyd algorithm by Childs et al., who introduced the Fourier approach leveraging linear combinations of unitary operators, our research has identified non-trivial redundancies within this method. This finding points to a considerable potential for refinement. In this paper, we propose the quantum Krylov-subspace method (QKSM), which is a hy… ▽ More Despite the successful enhancement to the Harrow-Hassidim-Lloyd algorithm by Childs et al., who introduced the Fourier approach leveraging linear combinations of unitary operators, our research has identified non-trivial redundancies within this method. This finding points to a considerable potential for refinement. In this paper, we propose the quantum Krylov-subspace method (QKSM), which is a hybrid classical-quantum algorithm, to mitigate such redundancies. By integrating QKSM as a subroutine, we introduce the quantum Krylov-subspace method based linear solver that not only reduces computational redundancy but also enhances efficiency and accuracy. Extensive numerical experiments, conducted on systems with dimensions up to $2^{10} \times 2^{10}$, have demonstrated a significant reduction in computational resources and have led to more precise approximations. △ Less

Submitted 10 May, 2024; originally announced May 2024.

arXiv:2405.04434 [pdf, other]

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Authors: DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Hao Yang, Haowei Zhang, Honghui Ding , et al. (132 additional authors not shown)

Abstract: We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference… ▽ More We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference through significantly compressing the Key-Value (KV) cache into a latent vector, while DeepSeekMoE enables training strong models at an economical cost through sparse computation. Compared with DeepSeek 67B, DeepSeek-V2 achieves significantly stronger performance, and meanwhile saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum generation throughput to 5.76 times. We pretrain DeepSeek-V2 on a high-quality and multi-source corpus consisting of 8.1T tokens, and further perform Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to fully unlock its potential. Evaluation results show that, even with only 21B activated parameters, DeepSeek-V2 and its chat versions still achieve top-tier performance among open-source models. △ Less

Submitted 19 June, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

arXiv:2405.03000 [pdf, other]

MedAdapter: Efficient Test-Time Adaptation of Large Language Models towards Medical Reasoning

Authors: Wenqi Shi, Ran Xu, Yuchen Zhuang, Yue Yu, Hang Wu, Carl Yang, May D. Wang

Abstract: Despite their improved capabilities in generation and reasoning, adapting large language models (LLMs) to the biomedical domain remains challenging due to their immense size and corporate privacy. In this work, we propose MedAdapter, a unified post-hoc adapter for test-time adaptation of LLMs towards biomedical applications. Instead of fine-tuning the entire LLM, MedAdapter effectively adapts the… ▽ More Despite their improved capabilities in generation and reasoning, adapting large language models (LLMs) to the biomedical domain remains challenging due to their immense size and corporate privacy. In this work, we propose MedAdapter, a unified post-hoc adapter for test-time adaptation of LLMs towards biomedical applications. Instead of fine-tuning the entire LLM, MedAdapter effectively adapts the original model by fine-tuning only a small BERT-sized adapter to rank candidate solutions generated by LLMs. Experiments demonstrate that MedAdapter effectively adapts both white-box and black-box LLMs in biomedical reasoning, achieving average performance improvements of 25.48% and 11.31%, respectively, without requiring extensive computational resources or sharing data with third parties. MedAdapter also yields superior performance when combined with train-time adaptation, highlighting a flexible and complementary solution to existing adaptation methods. Faced with the challenges of balancing model performance, computational resources, and data privacy, MedAdapter provides an efficient, privacy-preserving, cost-effective, and transparent solution for adapting LLMs to the biomedical domain. △ Less

Submitted 5 May, 2024; originally announced May 2024.

Comments: Work in Progress

arXiv:2405.02724 [pdf, ps, other]

Taming Equilibrium Bias in Risk-Sensitive Multi-Agent Reinforcement Learning

Authors: Yingjie Fei, Ruitu Xu

Abstract: We study risk-sensitive multi-agent reinforcement learning under general-sum Markov games, where agents optimize the entropic risk measure of rewards with possibly diverse risk preferences. We show that using the regret naively adapted from existing literature as a performance metric could induce policies with equilibrium bias that favor the most risk-sensitive agents and overlook the other agents… ▽ More We study risk-sensitive multi-agent reinforcement learning under general-sum Markov games, where agents optimize the entropic risk measure of rewards with possibly diverse risk preferences. We show that using the regret naively adapted from existing literature as a performance metric could induce policies with equilibrium bias that favor the most risk-sensitive agents and overlook the other agents. To address such deficiency of the naive regret, we propose a novel notion of regret, which we call risk-balanced regret, and show through a lower bound that it overcomes the issue of equilibrium bias. Furthermore, we develop a self-play algorithm for learning Nash, correlated, and coarse correlated equilibria in risk-sensitive Markov games. We prove that the proposed algorithm attains near-optimal regret guarantees with respect to the risk-balanced regret. △ Less

Submitted 4 May, 2024; originally announced May 2024.

Comments: 29 pages

arXiv:2405.02522 [pdf]

New contexts, old heuristics: How young people in India and the US trust online content in the age of generative AI

Authors: Rachel Xu, Nhu Le, Rebekah Park, Laura Murray, Vishnupriya Das, Devika Kumar, Beth Goldberg

Abstract: We conducted an in-person ethnography in India and the US to investigate how young people (18-24) trusted online content, with a focus on generative AI (GenAI). We had four key findings about how young people use GenAI and determine what to trust online. First, when online, we found participants fluidly shifted between mindsets and emotional states, which we term "information modes." Second, these… ▽ More We conducted an in-person ethnography in India and the US to investigate how young people (18-24) trusted online content, with a focus on generative AI (GenAI). We had four key findings about how young people use GenAI and determine what to trust online. First, when online, we found participants fluidly shifted between mindsets and emotional states, which we term "information modes." Second, these information modes shaped how and why participants trust GenAI and how they applied literacy skills. In the modes where they spent most of their time, they eschewed literacy skills. Third, with the advent of GenAI, participants imported existing trust heuristics from familiar online contexts into their interactions with GenAI. Fourth, although study participants had reservations about GenAI, they saw it as a requisite tool to adopt to keep up with the times. Participants valued efficiency above all else, and used GenAI to further their goals quickly at the expense of accuracy. Our findings suggest that young people spend the majority of their time online not concerned with truth because they are seeking only to pass the time. As a result, literacy interventions should be designed to intervene at the right time, to match users' distinct information modes, and to work with their existing fact-checking practices. △ Less

Submitted 3 May, 2024; originally announced May 2024.

Comments: 14 pages

arXiv:2405.00238 [pdf, other]

Noise reduction by bias cooling in gated Si/SixGe1-x quantum dots

Authors: Julian Ferrero, Thomas Koch, Sonja Vogel, Daniel Schroller, Viktor Adam, Ran Xue, Inga Seidler, Lars R. Schreiber, Hendrik Bluhm, Wolfgang Wernsdorfer

Abstract: Silicon-Germanium heterostructures are a promising quantum circuit platform, but crucial aspects as the long-term charge dynamics and cooldown-to-cooldown variations are still widely unexplored quantitatively. In this letter we present the results of an extensive bias cooling study performed on gated silicon-germanium quantum dots with an Al2O3-dielectric. Over 80 cooldowns were performed in the c… ▽ More Silicon-Germanium heterostructures are a promising quantum circuit platform, but crucial aspects as the long-term charge dynamics and cooldown-to-cooldown variations are still widely unexplored quantitatively. In this letter we present the results of an extensive bias cooling study performed on gated silicon-germanium quantum dots with an Al2O3-dielectric. Over 80 cooldowns were performed in the course of our investigations. The performance of the devices is assessed by low-frequency charge noise measurements in the band of 200 micro Hertz to 10 milli Hertz. We measure the total noise power as a function of the applied voltage during cooldown in four different devices and find a minimum in noise at 0.7V bias cooling voltage for all observed samples. We manage to decrease the total noise power median by a factor of 6 and compute a reduced tunneling current density using Schrödinger-Poisson simulations. Furthermore, we show the variation in noise from the same device in the course of eleven different cooldowns performed under the nominally same conditions. △ Less

Submitted 8 May, 2024; v1 submitted 30 April, 2024; originally announced May 2024.

Comments: 11 pages, 8 figures, silicon, SiGe, heterostructure, bias cooling, experiment, simulation

arXiv:2404.19641 [pdf]

Fast and label-free 3D virtual H&E histology via active modulation-assisted dynamic full-field OCT

Authors: Zichen Yin, Bin He, Yuzhe Ying, Shuwei Zhang, Panqi Yang, Zhengyu Chen, Zhangwei Hu, Yejiong Shi, Ruizhi Xue, Chengming Wang, Shu Wang, Guihuai Wang, ** Xue

Abstract: Pathological features are the gold standard for tumor diagnosis, guiding treatment and prognosis. However, standard histopathological process is labor-intensive and time-consuming, while frozen sections have lower accuracy. Dynamic full-field optical coherence tomography (D-FFOCT) offers rapid histologic information by measuring the subcellular dynamics of fresh, unprocessed tissues. However, D-FF… ▽ More Pathological features are the gold standard for tumor diagnosis, guiding treatment and prognosis. However, standard histopathological process is labor-intensive and time-consuming, while frozen sections have lower accuracy. Dynamic full-field optical coherence tomography (D-FFOCT) offers rapid histologic information by measuring the subcellular dynamics of fresh, unprocessed tissues. However, D-FFOCT images suffer from abrupt shifts in hue and brightness, which is confusing for pathologists and diminish their interpretability and reliability. Here, we present active phase modulation-assisted D-FFOCT (APMD-FFOCT) to improve the imaging stability and enhance the contrast of static tissues. This enables us to further employ an unsupervised deep learning to convert APMD-FFOCT images into virtual hematoxylin and eosin (H&E) stained images for the first time. Three-dimensional (3D) virtual H&E-stained images have been obtained at a scanning rate of 1 frame per second, as demonstrated in cancer diagnosis for human central nervous system and breast. The results prove that this new method will play a unique and important role in intraoperative histology. △ Less

Submitted 26 April, 2024; originally announced April 2024.

arXiv:2404.18861 [pdf, other]

A Survey on Vision Mamba: Models, Applications and Challenges

Authors: Rui Xu, Shu Yang, Yihui Wang, Bo Du, Hao Chen

Abstract: Mamba, a recent selective structured state space model, performs excellently on long sequence modeling tasks. Mamba mitigates the modeling constraints of convolutional neural networks and offers advanced modeling capabilities similar to those of Transformers, through global receptive fields and dynamic weighting. Crucially, it achieves this without incurring the quadratic computational complexity… ▽ More Mamba, a recent selective structured state space model, performs excellently on long sequence modeling tasks. Mamba mitigates the modeling constraints of convolutional neural networks and offers advanced modeling capabilities similar to those of Transformers, through global receptive fields and dynamic weighting. Crucially, it achieves this without incurring the quadratic computational complexity typically associated with Transformers. Due to its advantages over the former two mainstream foundation models, Mamba exhibits great potential to be a visual foundation model. Researchers are actively applying Mamba to various computer vision tasks, leading to numerous emerging works. To help keep pace with the rapid advancements in computer vision, this paper aims to provide a comprehensive review of visual Mamba approaches. This paper begins by delineating the formulation of the original Mamba model. Subsequently, our review of visual Mamba delves into several representative backbone networks to elucidate the core insights of the visual Mamba. We then categorize related works using different modalities, including image, video, point cloud, multi-modal, and others. Specifically, for image applications, we further organize them into distinct tasks to facilitate a more structured discussion. Finally, we discuss the challenges and future research directions for visual Mamba, providing insights for future research in this quickly evolving area. A comprehensive list of visual Mamba models reviewed in this work is available at https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.18824 [pdf, other]

Benchmarking Benchmark Leakage in Large Language Models

Authors: Ruijie Xu, Zengzhi Wang, Run-Ze Fan, Pengfei Liu

Abstract: Amid the expanding use of pre-training data, the phenomenon of benchmark dataset leakage has become increasingly prominent, exacerbated by opaque training processes and the often undisclosed inclusion of supervised data in contemporary Large Language Models (LLMs). This issue skews benchmark effectiveness and fosters potentially unfair comparisons, impeding the field's healthy development. To addr… ▽ More Amid the expanding use of pre-training data, the phenomenon of benchmark dataset leakage has become increasingly prominent, exacerbated by opaque training processes and the often undisclosed inclusion of supervised data in contemporary Large Language Models (LLMs). This issue skews benchmark effectiveness and fosters potentially unfair comparisons, impeding the field's healthy development. To address this, we introduce a detection pipeline utilizing Perplexity and N-gram accuracy, two simple and scalable metrics that gauge a model's prediction precision on benchmark, to identify potential data leakages. By analyzing 31 LLMs under the context of mathematical reasoning, we reveal substantial instances of training even test set misuse, resulting in potentially unfair comparisons. These findings prompt us to offer several recommendations regarding model documentation, benchmark setup, and future evaluations. Notably, we propose the "Benchmark Transparency Card" to encourage clear documentation of benchmark utilization, promoting transparency and healthy developments of LLMs. we have made our leaderboard, pipeline implementation, and model predictions publicly available, fostering future research. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: 30 pages; Homepage: https://gair-nlp.github.io/benbench

arXiv:2404.18443 [pdf, other]

BMRetriever: Tuning Large Language Models as Better Biomedical Text Retrievers

Authors: Ran Xu, Wenqi Shi, Yue Yu, Yuchen Zhuang, Yanqiao Zhu, May D. Wang, Joyce C. Ho, Chao Zhang, Carl Yang

Abstract: Develo** effective biomedical retrieval models is important for excelling at knowledge-intensive biomedical tasks but still challenging due to the deficiency of sufficient publicly annotated biomedical data and computational resources. We present BMRetriever, a series of dense retrievers for enhancing biomedical retrieval via unsupervised pre-training on large biomedical corpora, followed by ins… ▽ More Develo** effective biomedical retrieval models is important for excelling at knowledge-intensive biomedical tasks but still challenging due to the deficiency of sufficient publicly annotated biomedical data and computational resources. We present BMRetriever, a series of dense retrievers for enhancing biomedical retrieval via unsupervised pre-training on large biomedical corpora, followed by instruction fine-tuning on a combination of labeled datasets and synthetic pairs. Experiments on 5 biomedical tasks across 11 datasets verify BMRetriever's efficacy on various biomedical applications. BMRetriever also exhibits strong parameter efficiency, with the 410M variant outperforming baselines up to 11.7 times larger, and the 2B variant matching the performance of models with over 5B parameters. The training data and model checkpoints are released at \url{https://huggingface.co/BMRetriever} to ensure transparency, reproducibility, and application to new domains. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: Work in progress. The model and data will be uploaded to \url{https://github.com/ritaranx/BMRetriever}

arXiv:2404.18231 [pdf, other]

From Persona to Personalization: A Survey on Role-Playing Language Agents

Authors: Jiangjie Chen, Xintao Wang, Rui Xu, Siyu Yuan, Yikai Zhang, Wei Shi, Jian Xie, Shuang Li, Ruihan Yang, Tinghui Zhu, Aili Chen, Nianqi Li, Lida Chen, Caiyu Hu, Siye Wu, Scott Ren, Ziquan Fu, Yanghua Xiao

Abstract: Recent advancements in large language models (LLMs) have significantly boosted the rise of Role-Playing Language Agents (RPLAs), i.e., specialized AI systems designed to simulate assigned personas. By harnessing multiple advanced abilities of LLMs, including in-context learning, instruction following, and social intelligence, RPLAs achieve a remarkable sense of human likeness and vivid role-playin… ▽ More Recent advancements in large language models (LLMs) have significantly boosted the rise of Role-Playing Language Agents (RPLAs), i.e., specialized AI systems designed to simulate assigned personas. By harnessing multiple advanced abilities of LLMs, including in-context learning, instruction following, and social intelligence, RPLAs achieve a remarkable sense of human likeness and vivid role-playing performance. RPLAs can mimic a wide range of personas, ranging from historical figures and fictional characters to real-life individuals. Consequently, they have catalyzed numerous AI applications, such as emotional companions, interactive video games, personalized assistants and copilots, and digital clones. In this paper, we conduct a comprehensive survey of this field, illustrating the evolution and recent progress in RPLAs integrating with cutting-edge LLM technologies. We categorize personas into three types: 1) Demographic Persona, which leverages statistical stereotypes; 2) Character Persona, focused on well-established figures; and 3) Individualized Persona, customized through ongoing user interactions for personalized services. We begin by presenting a comprehensive overview of current methodologies for RPLAs, followed by the details for each persona type, covering corresponding data sourcing, agent construction, and evaluation. Afterward, we discuss the fundamental risks, existing limitations, and future prospects of RPLAs. Additionally, we provide a brief review of RPLAs in AI applications, which reflects practical user demands that shape and drive RPLA research. Through this work, we aim to establish a clear taxonomy of RPLA research and applications, and facilitate future research in this critical and ever-evolving field, and pave the way for a future where humans and RPLAs coexist in harmony. △ Less

Submitted 28 April, 2024; originally announced April 2024.

Comments: Preprint

arXiv:2404.15773 [pdf, other]

Possible gapless quantum spin liquid behavior in the triangular-lattice Ising antiferromagnet PrMgAl$_{11}$O$_{19}$

Authors: Zhen Ma, Shuhan Zheng, Yingqi Chen, Ruokai Xu, Zhao-Yang Dong, **ghui Wang, Hong Du, Jan Peter Embs, Shuaiwei Li, Yao Li, Yongjun Zhang, Meifeng Liu, Ruidan Zhong, Jun-Ming Liu, **sheng Wen

Abstract: Quantum spin liquids (QSLs) represent a novel state where spins are highly entangled but do not order even at zero temperature due to strong quantum fluctuations. Such a state is mostly studied in Heisenberg models defined on geometrically frustrated lattices. Here, we turn to a new triangular-lattice antiferromagnet PrMgAl$_{11}$O$_{19}$, in which the interactions are believed to be of Ising type… ▽ More Quantum spin liquids (QSLs) represent a novel state where spins are highly entangled but do not order even at zero temperature due to strong quantum fluctuations. Such a state is mostly studied in Heisenberg models defined on geometrically frustrated lattices. Here, we turn to a new triangular-lattice antiferromagnet PrMgAl$_{11}$O$_{19}$, in which the interactions are believed to be of Ising type. Magnetic susceptibility measured with an external field along the $c$ axis is two orders of magnitude larger than that with a field in the $ab$ plane, displaying an ideal easy-axis behavior. Meanwhile, there is no magnetic phase transition or spin freezing observed down to 1.8 K. Ultralow-temperature specific heat measured down to 50 mK does not capture any phase transition either, but a hump at 4.5 K, below which the magnetic specific heat exhibits a quasi-quadratic temperature dependence that is consistent with a Dirac QSL state. Inelastic neutron scattering technique is also employed to elucidate the nature of its ground state. In the magnetic excitation spectra, there is a gapless broad continuum at the base temperature 55~mK, in favor of the realization of a gapless QSL. Our results provide a scarce example for the QSL behaviors observed in an Ising-type magnet, which can serve as a promising platform for future research on QSL physics based on an Ising model. △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: 11 pages, 5 figures

Journal ref: Phys. Rev. B 109, 165143 (2024)

arXiv:2404.15661 [pdf, other]

CWF: Consolidating Weak Features in High-quality Mesh Simplification

Authors: Rui Xu, Longdu Liu, Ningna Wang, Shuangmin Chen, Shiqing Xin, Xiaohu Guo, Zichun Zhong, Taku Komura, Wen** Wang, Changhe Tu

Abstract: In mesh simplification, common requirements like accuracy, triangle quality, and feature alignment are often considered as a trade-off. Existing algorithms concentrate on just one or a few specific aspects of these requirements. For example, the well-known Quadric Error Metrics (QEM) approach prioritizes accuracy and can preserve strong feature lines/points as well but falls short in ensuring high… ▽ More In mesh simplification, common requirements like accuracy, triangle quality, and feature alignment are often considered as a trade-off. Existing algorithms concentrate on just one or a few specific aspects of these requirements. For example, the well-known Quadric Error Metrics (QEM) approach prioritizes accuracy and can preserve strong feature lines/points as well but falls short in ensuring high triangle quality and may degrade weak features that are not as distinctive as strong ones. In this paper, we propose a smooth functional that simultaneously considers all of these requirements. The functional comprises a normal anisotropy term and a Centroidal Voronoi Tessellation (CVT) energy term, with the variables being a set of movable points lying on the surface. The former inherits the spirit of QEM but operates in a continuous setting, while the latter encourages even point distribution, allowing various surface metrics. We further introduce a decaying weight to automatically balance the two terms. We selected 100 CAD models from the ABC dataset, along with 21 organic models, to compare the existing mesh simplification algorithms with ours. Experimental results reveal an important observation: the introduction of a decaying weight effectively reduces the conflict between the two terms and enables the alignment of weak features. This distinctive feature sets our approach apart from most existing mesh simplification methods and demonstrates significant potential in shape understanding. △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: 14 pages, 22 figures

arXiv:2404.15533 [pdf, other]

Designing, simulating, and performing the 100-AV field test for the CIRCLES consortium: Methodology and Implementation of the Largest mobile traffic control experiment to date

Authors: Mostafa Ameli, Sean Mcquade, Jonathan W. Lee, Matthew Bunting, Matthew Nice, Han Wang, William Barbour, Ryan Weightman, Chris Denaro, Ryan Delorenzo, Sharon Hornstein, Jon F. Davis, Dan Timsit, Riley Wagner, Rita Xu, Malaika Mahmood, Mikail Mahmood, Maria Laura Delle Monache, Benjamin Seibold, Daniel B. Work, Jonathan Sprinkle, Benedetto Piccoli, Alexandre M. Bayen

Abstract: Previous controlled experiments on single-lane ring roads have shown that a single partially autonomous vehicle (AV) can effectively mitigate traffic waves. This naturally prompts the question of how these findings can be generalized to field operational, high-density traffic conditions. To address this question, the Congestion Impacts Reduction via CAV-in-the-loop Lagrangian Energy Smoothing (CIR… ▽ More Previous controlled experiments on single-lane ring roads have shown that a single partially autonomous vehicle (AV) can effectively mitigate traffic waves. This naturally prompts the question of how these findings can be generalized to field operational, high-density traffic conditions. To address this question, the Congestion Impacts Reduction via CAV-in-the-loop Lagrangian Energy Smoothing (CIRCLES) Consortium conducted MegaVanderTest (MVT), a live traffic control experiment involving 100 vehicles near Nashville, TN, USA. This article is a tutorial for develo** analytical and simulation-based tools essential for designing and executing a live traffic control experiment like the MVT. It presents an overview of the proposed roadmap and various procedures used in designing, monitoring, and conducting the MVT, which is the largest mobile traffic control experiment at the time. The design process is aimed at evaluating the impact of the CIRCLES AVs on surrounding traffic. The article discusses the agent-based traffic simulation framework created for this evaluation. A novel methodological framework is introduced to calibrate this microsimulation, aiming to accurately capture traffic dynamics and assess the impact of adding 100 vehicles to existing traffic. The calibration model's effectiveness is verified using data from a six-mile section of Nashville's I-24 highway. The results indicate that the proposed model establishes an effective feedback loop between the optimizer and the simulator, thereby calibrating flow and speed with different spatiotemporal characteristics to minimize the error between simulated and real-world data. Finally, We simulate AVs in multiple scenarios to assess their effect on traffic congestion. This evaluation validates the AV routes, thereby contributing to the execution of a safe and successful live traffic control experiment via AVs. △ Less

Submitted 23 April, 2024; originally announced April 2024.

arXiv:2404.14642 [pdf, other]

Uncertainty Quantification on Graph Learning: A Survey

Authors: Chao Chen, Chenghua Guo, Rui Xu, Xiangwen Liao, Xi Zhang, Sihong Xie, Hui Xiong, Philip Yu

Abstract: Graphical models, including Graph Neural Networks (GNNs) and Probabilistic Graphical Models (PGMs), have demonstrated their exceptional capabilities across numerous fields. These models necessitate effective uncertainty quantification to ensure reliable decision-making amid the challenges posed by model training discrepancies and unpredictable testing scenarios. This survey examines recent works t… ▽ More Graphical models, including Graph Neural Networks (GNNs) and Probabilistic Graphical Models (PGMs), have demonstrated their exceptional capabilities across numerous fields. These models necessitate effective uncertainty quantification to ensure reliable decision-making amid the challenges posed by model training discrepancies and unpredictable testing scenarios. This survey examines recent works that address uncertainty quantification within the model architectures, training, and inference of GNNs and PGMs. We aim to provide an overview of the current landscape of uncertainty in graphical models by organizing the recent methods into uncertainty representation and handling. By summarizing state-of-the-art methods, this survey seeks to deepen the understanding of uncertainty quantification in graphical models, thereby increasing their effectiveness and safety in critical applications. △ Less

Submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.13525 [pdf, other]

QR Decomposition of Dual Matrices and its Application to Traveling Wave Identification in the Brain

Authors: Renjie Xu, Tong Wei, Yimin Wei, Pengpeng Xie

Abstract: Matrix decompositions in dual number representations have played an important role in fields such as kinematics and computer graphics in recent years. In this paper, we present a QR decomposition algorithm for dual number matrices, specifically geared towards its application in traveling wave identification, utilizing the concept of proper orthogonal decomposition. When dealing with large-scale pr… ▽ More Matrix decompositions in dual number representations have played an important role in fields such as kinematics and computer graphics in recent years. In this paper, we present a QR decomposition algorithm for dual number matrices, specifically geared towards its application in traveling wave identification, utilizing the concept of proper orthogonal decomposition. When dealing with large-scale problems, we present explicit solutions for the QR, thin QR, and randomized QR decompositions of dual number matrices, along with their respective algorithms with column pivoting. The QR decomposition of dual matrices is an accurate first-order perturbation, with the Q-factor satisfying rigorous perturbation bounds, leading to enhanced orthogonality. In numerical experiments, we discuss the suitability of different QR algorithms when confronted with various large-scale dual matrices, providing their respective domains of applicability. Subsequently, we employed the QR decomposition of dual matrices to compute the DMPGI, thereby attaining results of higher precision. Moreover, we apply the QR decomposition in the context of traveling wave identification, employing the notion of proper orthogonal decomposition to perform a validation analysis of large-scale functional magnetic resonance imaging (fMRI) data for brain functional circuits. Our approach significantly improves the identification of two types of wave signals compared to previous research, providing empirical evidence for cognitive neuroscience theories. △ Less

Submitted 21 April, 2024; originally announced April 2024.

arXiv:2404.13420 [pdf, other]

NeurCADRecon: Neural Representation for Reconstructing CAD Surfaces by Enforcing Zero Gaussian Curvature

Authors: Qiujie Dong, Rui Xu, Pengfei Wang, Shuangmin Chen, Shiqing Xin, Xiaohong Jia, Wen** Wang, Changhe Tu

Abstract: Despite recent advances in reconstructing an organic model with the neural signed distance function (SDF), the high-fidelity reconstruction of a CAD model directly from low-quality unoriented point clouds remains a significant challenge. In this paper, we address this challenge based on the prior observation that the surface of a CAD model is generally composed of piecewise surface patches, each a… ▽ More Despite recent advances in reconstructing an organic model with the neural signed distance function (SDF), the high-fidelity reconstruction of a CAD model directly from low-quality unoriented point clouds remains a significant challenge. In this paper, we address this challenge based on the prior observation that the surface of a CAD model is generally composed of piecewise surface patches, each approximately developable even around the feature line. Our approach, named NeurCADRecon, is self-supervised, and its loss includes a developability term to encourage the Gaussian curvature toward 0 while ensuring fidelity to the input points. Noticing that the Gaussian curvature is non-zero at tip points, we introduce a double-trough curve to tolerate the existence of these tip points. Furthermore, we develop a dynamic sampling strategy to deal with situations where the given points are incomplete or too sparse. Since our resulting neural SDFs can clearly manifest sharp feature points/lines, one can easily extract the feature-aligned triangle mesh from the SDF and then decompose it into smooth surface patches, greatly reducing the difficulty of recovering the parametric CAD design. A comprehensive comparison with existing state-of-the-art methods shows the significant advantage of our approach in reconstructing faithful CAD shapes. △ Less

Submitted 20 April, 2024; originally announced April 2024.

Comments: ACM Transactions on Graphics (SIGGRAPH 2024)

arXiv:2404.12726 [pdf, other]

Evaluating Character Understanding of Large Language Models via Character Profiling from Fictional Works

Authors: Xinfeng Yuan, Siyu Yuan, Yuhan Cui, Tianhe Lin, Xintao Wang, Rui Xu, Jiangjie Chen, Deqing Yang

Abstract: Large language models (LLMs) have demonstrated impressive performance and spurred numerous AI applications, in which role-playing agents (RPAs) are particularly popular, especially for fictional characters. The prerequisite for these RPAs lies in the capability of LLMs to understand characters from fictional works. Previous efforts have evaluated this capability via basic classification tasks or c… ▽ More Large language models (LLMs) have demonstrated impressive performance and spurred numerous AI applications, in which role-playing agents (RPAs) are particularly popular, especially for fictional characters. The prerequisite for these RPAs lies in the capability of LLMs to understand characters from fictional works. Previous efforts have evaluated this capability via basic classification tasks or characteristic imitation, failing to capture the nuanced character understanding with LLMs. In this paper, we propose evaluating LLMs' character understanding capability via the character profiling task, i.e., summarizing character profiles from corresponding materials, a widely adopted yet understudied practice for RPA development. Specifically, we construct the CroSS dataset from literature experts and assess the generated profiles by comparing ground truth references and their applicability in downstream tasks. Our experiments, which cover various summarization methods and LLMs, have yielded promising results. These results strongly validate the character understanding capability of LLMs. Resources are available at https://github.com/Joanna0123/character_profiling. △ Less

Submitted 2 July, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

arXiv:2404.12705 [pdf, other]

Integrated Sensing and Communication enabled Multiple Base Stations Cooperative UAV Detection

Authors: Xi Lu, Zhiqing Wei, Ruizhong Xu, Lin Wang, Bohao Lu, **ghui Piao

Abstract: Integrated sensing and communication (ISAC) exhibits notable potential for sensing the unmanned aerial vehicles (UAVs), facilitating real-time monitoring of UAVs for security insurance. Due to the low sensing accuracy of single base stations (BSs), a cooperative UAV sensing method by multi-BS is proposed in this paper to achieve high-accuracy sensing. Specifically, a multiple signal classification… ▽ More Integrated sensing and communication (ISAC) exhibits notable potential for sensing the unmanned aerial vehicles (UAVs), facilitating real-time monitoring of UAVs for security insurance. Due to the low sensing accuracy of single base stations (BSs), a cooperative UAV sensing method by multi-BS is proposed in this paper to achieve high-accuracy sensing. Specifically, a multiple signal classification (MUSIC)-based symbol-level fusion method is proposed for UAV localization and velocity estimation, consisting of a single-BS preprocessing step and a lattice points searching step. The preprocessing procedure enhances the single-BS accuracy by superposing multiple spectral functions, thereby establishing a reference value for subsequent lattice points searching. Furthermore, the lattice point with minimal error compared to the preprocessing results is determined as the fusion result. Extensive simulation results reveal that the proposed symbol-level fusion method outperforms the benchmarking methods in localization and velocity estimation. △ Less

Submitted 19 April, 2024; originally announced April 2024.

arXiv:2404.12138 [pdf, other]

Character is Destiny: Can Large Language Models Simulate Persona-Driven Decisions in Role-Playing?

Authors: Rui Xu, Xintao Wang, Jiangjie Chen, Siyu Yuan, Xinfeng Yuan, Jiaqing Liang, Zulong Chen, Xiaoqing Dong, Yanghua Xiao

Abstract: Can Large Language Models substitute humans in making important decisions? Recent research has unveiled the potential of LLMs to role-play assigned personas, mimicking their knowledge and linguistic habits. However, imitative decision-making requires a more nuanced understanding of personas. In this paper, we benchmark the ability of LLMs in persona-driven decision-making. Specifically, we investi… ▽ More Can Large Language Models substitute humans in making important decisions? Recent research has unveiled the potential of LLMs to role-play assigned personas, mimicking their knowledge and linguistic habits. However, imitative decision-making requires a more nuanced understanding of personas. In this paper, we benchmark the ability of LLMs in persona-driven decision-making. Specifically, we investigate whether LLMs can predict characters' decisions provided with the preceding stories in high-quality novels. Leveraging character analyses written by literary experts, we construct a dataset LIFECHOICE comprising 1,401 character decision points from 395 books. Then, we conduct comprehensive experiments on LIFECHOICE, with various LLMs and methods for LLM role-playing. The results demonstrate that state-of-the-art LLMs exhibit promising capabilities in this task, yet there is substantial room for improvement. Hence, we further propose the CHARMAP method, which achieves a 6.01% increase in accuracy via persona-based memory retrieval. We will make our datasets and code publicly available. △ Less

Submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.11093 [pdf, other]

Neural Network Approach for Non-Markovian Dissipative Dynamics of Many-Body Open Quantum Systems

Authors: Long Cao, Liwei Ge, Daochi Zhang, Xiang Li, Yao Wang, Rui-Xue Xu, Yi**g Yan, Xiao Zheng

Abstract: Simulating the dynamics of open quantum systems coupled to non-Markovian environments remains an outstanding challenge due to exponentially scaling computational costs. We present an artificial intelligence strategy to overcome this obstacle by integrating the neural quantum states approach into the dissipaton-embedded quantum master equation in second quantization (DQME-SQ). Our approach utilizes… ▽ More Simulating the dynamics of open quantum systems coupled to non-Markovian environments remains an outstanding challenge due to exponentially scaling computational costs. We present an artificial intelligence strategy to overcome this obstacle by integrating the neural quantum states approach into the dissipaton-embedded quantum master equation in second quantization (DQME-SQ). Our approach utilizes restricted Boltzmann machines (RBMs) to compactly represent the reduced density tensor, explicitly encoding the combined effects of system-environment correlations and nonMarkovian memory. Applied to model systems exhibiting prominent effects of system-environment correlation and non-Markovian memory, our approach achieves comparable accuracy to conventional hierarchical equations of motion, while requiring significantly fewer dynamical variables. The novel RBM-based DQME-SQ approach paves the way for investigating non-Markovian open quantum dynamics in previously intractable regimes, with implications spanning various frontiers of modern science. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: 7 pages, 5 figures

arXiv:2404.10405 [pdf, other]

Integration of Self-Supervised BYOL in Semi-Supervised Medical Image Recognition

Authors: Hao Feng, Yuanzhe Jia, Ruijia Xu, Mukesh Prasad, Ali Anaissi, Ali Braytee

Abstract: Image recognition techniques heavily rely on abundant labeled data, particularly in medical contexts. Addressing the challenges associated with obtaining labeled data has led to the prominence of self-supervised learning and semi-supervised learning, especially in scenarios with limited annotated data. In this paper, we proposed an innovative approach by integrating self-supervised learning into s… ▽ More Image recognition techniques heavily rely on abundant labeled data, particularly in medical contexts. Addressing the challenges associated with obtaining labeled data has led to the prominence of self-supervised learning and semi-supervised learning, especially in scenarios with limited annotated data. In this paper, we proposed an innovative approach by integrating self-supervised learning into semi-supervised models to enhance medical image recognition. Our methodology commences with pre-training on unlabeled data utilizing the BYOL method. Subsequently, we merge pseudo-labeled and labeled datasets to construct a neural network classifier, refining it through iterative fine-tuning. Experimental results on three different datasets demonstrate that our approach optimally leverages unlabeled data, outperforming existing methods in terms of accuracy for medical image recognition. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: Accepted by ICCS 2024

arXiv:2404.10399 [pdf, other]

FoundationGrasp: Generalizable Task-Oriented Gras** with Foundation Models

Authors: Chao Tang, Dehao Huang, Wenlong Dong, Ruinian Xu, Hong Zhang

Abstract: Task-oriented gras** (TOG), which refers to the problem of synthesizing grasps on an object that are configurationally compatible with the downstream manipulation task, is the first milestone towards tool manipulation. Analogous to the activation of two brain regions responsible for semantic and geometric reasoning during cognitive processes, modeling the complex relationship between objects, ta… ▽ More Task-oriented gras** (TOG), which refers to the problem of synthesizing grasps on an object that are configurationally compatible with the downstream manipulation task, is the first milestone towards tool manipulation. Analogous to the activation of two brain regions responsible for semantic and geometric reasoning during cognitive processes, modeling the complex relationship between objects, tasks, and grasps requires rich prior knowledge about objects and tasks. Existing methods typically limit the prior knowledge to a closed-set scope and cannot support the generalization to novel objects and tasks out of the training set. To address such a limitation, we propose FoundationGrasp, a foundation model-based TOG framework that leverages the open-ended knowledge from foundation models to learn generalizable TOG skills. Comprehensive experiments are conducted on the contributed Language and Vision Augmented TaskGrasp (LaViA-TaskGrasp) dataset, demonstrating the superiority of FoudationGrasp over existing methods when generalizing to novel object instances, object classes, and tasks out of the training set. Furthermore, the effectiveness of FoudationGrasp is validated in real-robot gras** and manipulation experiments on a 7 DoF robotic arm. Our code, data, appendix, and video are publicly available at https://sites.google.com/view/foundationgrasp. △ Less

Submitted 16 April, 2024; originally announced April 2024.

arXiv:2404.09793 [pdf, other]

First Search for Light Fermionic Dark Matter Absorption on Electrons Using Germanium Detector in CDEX-10 Experiment

Authors: J. X. Liu, L. T. Yang, Q. Yue, K. J. Kang, Y. J. Li, H. P. An, Greeshma C., J. P. Chang, Y. H. Chen, J. P. Cheng, W. H. Dai, Z. Deng, C. H. Fang, X. P. Geng, H. Gong, Q. J. Guo, T. Guo, X. Y. Guo, L. He, J. R. He, J. W. Hu, H. X. Huang, T. C. Huang, L. Jiang, S. Karmakar , et al. (61 additional authors not shown)

Abstract: We present the first results of the search for sub-MeV fermionic dark matter absorbed by electron targets of Germanium using the 205.4~kg$\cdot$day data collected by the CDEX-10 experiment, with the analysis threshold of 160~eVee. No significant dark matter (DM) signals over the background are observed. Results are presented as limits on the cross section of DM--electron interaction. We present ne… ▽ More We present the first results of the search for sub-MeV fermionic dark matter absorbed by electron targets of Germanium using the 205.4~kg$\cdot$day data collected by the CDEX-10 experiment, with the analysis threshold of 160~eVee. No significant dark matter (DM) signals over the background are observed. Results are presented as limits on the cross section of DM--electron interaction. We present new constraints of cross section in the DM range of 0.1--10 keV/$c^2$ for vector and axial-vector interaction. The upper limit on the cross section is set to be $\rm 5.5\times10^{-46}~cm^2$ for vector interaction, and $\rm 1.8\times10^{-46}~cm^2$ for axial-vector interaction at DM mass of 5 keV/$c^2$. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: 6 pages, 4 figures

arXiv:2404.08514 [pdf, other]

NIR-Assisted Image Denoising: A Selective Fusion Approach and A Real-World Benchmark Dataset

Authors: Rongjian Xu, Zhilu Zhang, Renlong Wu, Wangmeng Zuo

Abstract: Despite the significant progress in image denoising, it is still challenging to restore fine-scale details while removing noise, especially in extremely low-light environments. Leveraging near-infrared (NIR) images to assist visible RGB image denoising shows the potential to address this issue, becoming a promising technology. Nonetheless, existing works still struggle with taking advantage of NIR… ▽ More Despite the significant progress in image denoising, it is still challenging to restore fine-scale details while removing noise, especially in extremely low-light environments. Leveraging near-infrared (NIR) images to assist visible RGB image denoising shows the potential to address this issue, becoming a promising technology. Nonetheless, existing works still struggle with taking advantage of NIR information effectively for real-world image denoising, due to the content inconsistency between NIR-RGB images and the scarcity of real-world paired datasets. To alleviate the problem, we propose an efficient Selective Fusion Module (SFM), which can be plug-and-played into the advanced denoising networks to merge the deep NIR-RGB features. Specifically, we sequentially perform the global and local modulation for NIR and RGB features, and then integrate the two modulated features. Furthermore, we present a Real-world NIR-Assisted Image Denoising (Real-NAID) dataset, which covers diverse scenarios as well as various noise levels. Extensive experiments on both synthetic and our real-world datasets demonstrate that the proposed method achieves better results than state-of-the-art ones. The dataset, codes, and pre-trained models will be publicly available at https://github.com/ronjonxu/NAID. △ Less

Submitted 18 April, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

Comments: 10 pages

arXiv:2404.07965 [pdf, other]

Rho-1: Not All Tokens Are What You Need

Authors: Zhenghao Lin, Zhibin Gou, Yeyun Gong, Xiao Liu, Yelong Shen, Ruochen Xu, Chen Lin, Yujiu Yang, Jian Jiao, Nan Duan, Weizhu Chen

Abstract: Previous language model pre-training methods have uniformly applied a next-token prediction loss to all training tokens. Challenging this norm, we posit that ''Not all tokens in a corpus are equally important for language model training''. Our initial analysis examines token-level training dynamics of language model, revealing distinct loss patterns for different tokens. Leveraging these insights,… ▽ More Previous language model pre-training methods have uniformly applied a next-token prediction loss to all training tokens. Challenging this norm, we posit that ''Not all tokens in a corpus are equally important for language model training''. Our initial analysis examines token-level training dynamics of language model, revealing distinct loss patterns for different tokens. Leveraging these insights, we introduce a new language model called Rho-1. Unlike traditional LMs that learn to predict every next token in a corpus, Rho-1 employs Selective Language Modeling (SLM), which selectively trains on useful tokens that aligned with the desired distribution. This approach involves scoring pretraining tokens using a reference model, and then training the language model with a focused loss on tokens with higher scores. When continual pretraining on 15B OpenWebMath corpus, Rho-1 yields an absolute improvement in few-shot accuracy of up to 30% in 9 math tasks. After fine-tuning, Rho-1-1B and 7B achieved state-of-the-art results of 40.6% and 51.8% on MATH dataset, respectively - matching DeepSeekMath with only 3% of the pretraining tokens. Furthermore, when pretraining on 80B general tokens, Rho-1 achieves 6.8% average enhancement across 15 diverse tasks, increasing both efficiency and performance of the language model pre-training. △ Less

Submitted 23 May, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

Comments: First two authors equal contribution

arXiv:2404.07950 [pdf, other]

Reinforcement Learning with Generalizable Gaussian Splatting

Authors: Jiaxu Wang, Qiang Zhang, **gkai Sun, Jiahang Cao, Yecheng Shao, Ren**g Xu

Abstract: An excellent representation is crucial for reinforcement learning (RL) performance, especially in vision-based reinforcement learning tasks. The quality of the environment representation directly influences the achievement of the learning task. Previous vision-based RL typically uses explicit or implicit ways to represent environments, such as images, points, voxels, and neural radiance fields. Ho… ▽ More An excellent representation is crucial for reinforcement learning (RL) performance, especially in vision-based reinforcement learning tasks. The quality of the environment representation directly influences the achievement of the learning task. Previous vision-based RL typically uses explicit or implicit ways to represent environments, such as images, points, voxels, and neural radiance fields. However, these representations contain several drawbacks. They cannot either describe complex local geometries or generalize well to unseen scenes, or require precise foreground masks. Moreover, these implicit neural representations are akin to a ``black box", significantly hindering interpretability. 3D Gaussian Splatting (3DGS), with its explicit scene representation and differentiable rendering nature, is considered a revolutionary change for reconstruction and representation methods. In this paper, we propose a novel Generalizable Gaussian Splatting framework to be the representation of RL tasks, called GSRL. Through validation in the RoboMimic environment, our method achieves better results than other baselines in multiple tasks, improving the performance by 10%, 44%, and 15% compared with baselines on the hardest task. This work is the first attempt to leverage generalizable 3DGS as a representation for RL. △ Less

Submitted 18 March, 2024; originally announced April 2024.

Comments: 7 pages,2 figures

arXiv:2404.07054 [pdf, other]

Quantum Mechanics of Open Systems in Non-Inertial Motion

Authors: Zi-Fan Zhu, Yu Su, Yao Wang, Rui-Xue Xu, Yi**g Yan

Abstract: The study of quantum mechanics in non-inertial reference frames, particularly in the context of open systems, introduces several intriguing phenomena and challenges. This paper presents a comprehensive framework for analyzing the quantum mechanics of open systems undergoing noninertial motion. Our methodology leverages the concept of dissipatons, statistical quasi-particles that capture collective… ▽ More The study of quantum mechanics in non-inertial reference frames, particularly in the context of open systems, introduces several intriguing phenomena and challenges. This paper presents a comprehensive framework for analyzing the quantum mechanics of open systems undergoing noninertial motion. Our methodology leverages the concept of dissipatons, statistical quasi-particles that capture collective dissipative effects from the environment. We demonstrate that our approach offers a natural understanding of the intricate dynamics among non-inertial effects, decoherence, dissipation, and system-bath entanglement. Specifically, we conduct demonstrations focusing on the Lamb shift phenomenon within a rotating ring cavity. Through theoretical exposition and practical applications, our framework elucidates the profound interplay between open quantum dynamics and non-inertial motion, paving the way for advancements in quantum information processing and sensing technologies. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: 7 pages, 1 figure

arXiv:2404.06449 [pdf, other]

Deep-Learning Database of Density Functional Theory Hamiltonians for Twisted Materials

Authors: Ting Bao, Runzhang Xu, He Li, Xiaoxun Gong, Zechen Tang, **gheng Fu, Wenhui Duan, Yong Xu

Abstract: Moiré-twisted materials have garnered significant research interest due to their distinctive properties and intriguing physics. However, conducting first-principles studies on such materials faces challenges, notably the formidable computational cost associated with simulating ultra-large twisted structures. This obstacle impedes the construction of a twisted materials database crucial for datadri… ▽ More Moiré-twisted materials have garnered significant research interest due to their distinctive properties and intriguing physics. However, conducting first-principles studies on such materials faces challenges, notably the formidable computational cost associated with simulating ultra-large twisted structures. This obstacle impedes the construction of a twisted materials database crucial for datadriven materials discovery. Here, by using high-throughput calculations and state-of-the-art neural network methods, we construct a Deep-learning Database of density functional theory (DFT) Hamiltonians for Twisted materials named DDHT. The DDHT database comprises trained neural-network models of over a hundred homo-bilayer and hetero-bilayer moiré-twisted materials. These models enable accurate prediction of the DFT Hamiltonian for these materials across arbitrary twist angles, with an averaged mean absolute error of approximately 1.0 meV or lower. The database facilitates the exploration of flat bands and correlated materials platforms within ultra-large twisted structures. △ Less

Submitted 9 April, 2024; originally announced April 2024.

arXiv:2404.06364 [pdf, other]

SurveyAgent: A Conversational System for Personalized and Efficient Research Survey

Authors: Xintao Wang, Jiangjie Chen, Nianqi Li, Lida Chen, Xinfeng Yuan, Wei Shi, Xuyang Ge, Rui Xu, Yanghua Xiao

Abstract: In the rapidly advancing research fields such as AI, managing and staying abreast of the latest scientific literature has become a significant challenge for researchers. Although previous efforts have leveraged AI to assist with literature searches, paper recommendations, and question-answering, a comprehensive support system that addresses the holistic needs of researchers has been lacking. This… ▽ More In the rapidly advancing research fields such as AI, managing and staying abreast of the latest scientific literature has become a significant challenge for researchers. Although previous efforts have leveraged AI to assist with literature searches, paper recommendations, and question-answering, a comprehensive support system that addresses the holistic needs of researchers has been lacking. This paper introduces SurveyAgent, a novel conversational system designed to provide personalized and efficient research survey assistance to researchers. SurveyAgent integrates three key modules: Knowledge Management for organizing papers, Recommendation for discovering relevant literature, and Query Answering for engaging with content on a deeper level. This system stands out by offering a unified platform that supports researchers through various stages of their literature review process, facilitated by a conversational interface that prioritizes user interaction and personalization. Our evaluation demonstrates SurveyAgent's effectiveness in streamlining research activities, showcasing its capability to facilitate how researchers interact with scientific literature. △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: 6 pages

arXiv:2404.04869 [pdf, other]

Prompting Multi-Modal Tokens to Enhance End-to-End Autonomous Driving Imitation Learning with LLMs

Authors: Yiqun Duan, Qiang Zhang, Ren**g Xu

Abstract: The utilization of Large Language Models (LLMs) within the realm of reinforcement learning, particularly as planners, has garnered a significant degree of attention in recent scholarly literature. However, a substantial proportion of existing research predominantly focuses on planning models for robotics that transmute the outputs derived from perception models into linguistic forms, thus adopting… ▽ More The utilization of Large Language Models (LLMs) within the realm of reinforcement learning, particularly as planners, has garnered a significant degree of attention in recent scholarly literature. However, a substantial proportion of existing research predominantly focuses on planning models for robotics that transmute the outputs derived from perception models into linguistic forms, thus adopting a `pure-language' strategy. In this research, we propose a hybrid End-to-End learning framework for autonomous driving by combining basic driving imitation learning with LLMs based on multi-modality prompt tokens. Instead of simply converting perception results from the separated train model into pure language input, our novelty lies in two aspects. 1) The end-to-end integration of visual and LiDAR sensory input into learnable multi-modality tokens, thereby intrinsically alleviating description bias by separated pre-trained perception models. 2) Instead of directly letting LLMs drive, this paper explores a hybrid setting of letting LLMs help the driving model correct mistakes and complicated scenarios. The results of our experiments suggest that the proposed methodology can attain driving scores of 49.21%, coupled with an impressive route completion rate of 91.34% in the offline evaluation conducted via CARLA. These performance metrics are comparable to the most advanced driving models. △ Less

Submitted 7 April, 2024; originally announced April 2024.

arXiv:2404.04851 [pdf]

Site-ordering/disordering-induced magnetic textures in a vdW ferromagnet by competing global and broken inversion-symmetry

Authors: Haoyan Zhang, Jianfeng Guo, Cong Wang, Le Lei, Shuo Mi, Songyang Li, Congkuan Tian, Shaohua Yan, Hanxiang Wu, Shiyu Zhu, Rui Xu, Xueyun Wang, Hechang Lei, Peng Cheng, Fei Pang, Wei Ji, Zhihai Cheng

Abstract: Fe5GeTe2 single crystals can be divided into nonquenched (NQ) and quench-cooled (QC) phases with different magnetic properties. A comprehensive understanding of the magnetic property variations in the NQ and QC phases is imperative for guiding Fe5GeTe2 towards spintronics applications; however, it remains elusive. Here, we report a real-space study on the structural and magnetic properties of thes… ▽ More Fe5GeTe2 single crystals can be divided into nonquenched (NQ) and quench-cooled (QC) phases with different magnetic properties. A comprehensive understanding of the magnetic property variations in the NQ and QC phases is imperative for guiding Fe5GeTe2 towards spintronics applications; however, it remains elusive. Here, we report a real-space study on the structural and magnetic properties of these two magnetic phases using cryogenic magnetic force microscopy and scanning tunneling microscopy. The thermal history introduces disorder and order to the Fe(1) sites, resulting in the NQ and QC phases exhibiting global and broken inversion symmetry, respectively. The observed magnetic domain transitions (branching to labyrinthine) in the spin reorientation process and the distinct 3D spin textures stabilized by magnetic dipolar interaction observed in field-dependent studies allow the NQ phase to exhibit a more resilient global magnetic state. In contrast, the QC phase exhibits enhanced magnetic anisotropy, resulting in a higher TC. Meanwhile, the Dzyaloshinskii-Moriya interaction (DMI) introduced by the broken inversion symmetry causes the QC phase to exhibit a localized magnetic state: no domain transformation occurs during spin reorientation, and irregular domain states are observed in field-related studies. Our work provides an important reference for understanding the complex magnetic properties in Fe5GeTe2. △ Less

Submitted 7 April, 2024; originally announced April 2024.

Comments: 18 pages,4 figures

arXiv:2404.04804 [pdf, other]

Light the Night: A Multi-Condition Diffusion Framework for Unpaired Low-Light Enhancement in Autonomous Driving

Authors: **long Li, Baolu Li, Zhengzhong Tu, Xinyu Liu, Qing Guo, Felix Juefei-Xu, Runsheng Xu, Hongkai Yu

Abstract: Vision-centric perception systems for autonomous driving have gained considerable attention recently due to their cost-effectiveness and scalability, especially compared to LiDAR-based systems. However, these systems often struggle in low-light conditions, potentially compromising their performance and safety. To address this, our paper introduces LightDiff, a domain-tailored framework designed to… ▽ More Vision-centric perception systems for autonomous driving have gained considerable attention recently due to their cost-effectiveness and scalability, especially compared to LiDAR-based systems. However, these systems often struggle in low-light conditions, potentially compromising their performance and safety. To address this, our paper introduces LightDiff, a domain-tailored framework designed to enhance the low-light image quality for autonomous driving applications. Specifically, we employ a multi-condition controlled diffusion model. LightDiff works without any human-collected paired data, leveraging a dynamic data degradation process instead. It incorporates a novel multi-condition adapter that adaptively controls the input weights from different modalities, including depth maps, RGB images, and text captions, to effectively illuminate dark scenes while maintaining context consistency. Furthermore, to align the enhanced images with the detection model's knowledge, LightDiff employs perception-specific scores as rewards to guide the diffusion training process through reinforcement learning. Extensive experiments on the nuScenes datasets demonstrate that LightDiff can significantly improve the performance of several state-of-the-art 3D detectors in night-time conditions while achieving high visual quality scores, highlighting its potential to safeguard autonomous driving. △ Less

Submitted 7 April, 2024; originally announced April 2024.

Comments: This paper is accepted by CVPR 2024

arXiv:2404.04801 [pdf, ps, other]

doi 10.1007/s41605-024-00467-8

LHAASO-KM2A detector simulation using Geant4

Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (254 additional authors not shown)

Abstract: KM2A is one of the main sub-arrays of LHAASO, working on gamma ray astronomy and cosmic ray physics at energies above 10 TeV. Detector simulation is the important foundation for estimating detector performance and data analysis. It is a big challenge to simulate the KM2A detector in the framework of Geant4 due to the need to track numerous photons from a large number of detector units (>6000) with… ▽ More KM2A is one of the main sub-arrays of LHAASO, working on gamma ray astronomy and cosmic ray physics at energies above 10 TeV. Detector simulation is the important foundation for estimating detector performance and data analysis. It is a big challenge to simulate the KM2A detector in the framework of Geant4 due to the need to track numerous photons from a large number of detector units (>6000) with large altitude difference (30 m) and huge coverage (1.3 km^2). In this paper, the design of the KM2A simulation code G4KM2A based on Geant4 is introduced. The process of G4KM2A is optimized mainly in memory consumption to avoid memory overffow. Some simpliffcations are used to signiffcantly speed up the execution of G4KM2A. The running time is reduced by at least 30 times compared to full detector simulation. The particle distributions and the core/angle resolution comparison between simulation and experimental data of the full KM2A array are also presented, which show good agreement. △ Less

Submitted 7 April, 2024; originally announced April 2024.

arXiv:2404.04792 [pdf, other]

doi 10.1145/3649329.3656259

GDR-HGNN: A Heterogeneous Graph Neural Networks Accelerator Frontend with Graph Decoupling and Recoupling

Authors: Runzhen Xue, Mingyu Yan, Dengke Han, Yihan Teng, Zhimin Tang, Xiaochun Ye, Dongrui Fan

Abstract: Heterogeneous Graph Neural Networks (HGNNs) have broadened the applicability of graph representation learning to heterogeneous graphs. However, the irregular memory access pattern of HGNNs leads to the buffer thrashing issue in HGNN accelerators. In this work, we identify an opportunity to address buffer thrashing in HGNN acceleration through an analysis of the topology of heterogeneous graphs. To… ▽ More Heterogeneous Graph Neural Networks (HGNNs) have broadened the applicability of graph representation learning to heterogeneous graphs. However, the irregular memory access pattern of HGNNs leads to the buffer thrashing issue in HGNN accelerators. In this work, we identify an opportunity to address buffer thrashing in HGNN acceleration through an analysis of the topology of heterogeneous graphs. To harvest this opportunity, we propose a graph restructuring method and map it into a hardware frontend named GDR-HGNN. GDR-HGNN dynamically restructures the graph on the fly to enhance data locality for HGNN accelerators. Experimental results demonstrate that, with the assistance of GDR-HGNN, a leading HGNN accelerator achieves an average speedup of 14.6 times and 1.78 times compared to the state-of-the-art software framework running on A100 GPU and itself, respectively. △ Less

Submitted 6 April, 2024; originally announced April 2024.

Comments: 6 pages, 10 figures, accepted by DAC'61

arXiv:2404.00628 [pdf, other]

Fluid Antenna Relay Assisted Communication Systems Through Antenna Location Optimization

Authors: Ruopeng Xu, Yixuan Chen, Jiawen Kang, Minrui Xu, Zhaohui Yang, Chongwen Huang, Dusit Niyato

Abstract: In this paper, we investigate the problem of resource allocation for fluid antenna relay (FAR) system with antenna location optimization. In the considered model, each user transmits information to a base station (BS) with help of FAR. The antenna location of the FAR is flexible and can be adapted to dynamic location distribution of the users. We formulate a sum rate maximization problem through j… ▽ More In this paper, we investigate the problem of resource allocation for fluid antenna relay (FAR) system with antenna location optimization. In the considered model, each user transmits information to a base station (BS) with help of FAR. The antenna location of the FAR is flexible and can be adapted to dynamic location distribution of the users. We formulate a sum rate maximization problem through jointly optimizing the antenna location and bandwidth allocation with meeting the minimum rate requirements, total bandwidth budget, and feasible antenna region constraints. To solve this problem, we obtain the optimal bandwidth in closed form. Based on the optimal bandwidth, the original problem is reduced to the antenna location optimization problem and an alternating algorithm is proposed. Simulation results verify the effectiveness of the proposed algorithm and the sum rate can be increased by up to 125% compared to the conventional schemes. △ Less

Submitted 27 June, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

arXiv:2404.00612 [pdf, other]

Resource Allocation for Green Probabilistic Semantic Communication with Rate Splitting

Authors: Ruopeng Xu, Zhaohui Yang, Zhouxiang Zhao, Qianqian Yang, Zhaoyang Zhang

Abstract: In this paper, the energy efficient design for probabilistic semantic communication (PSC) system with rate splitting multiple access (RSMA) is investigated. Basic principles are first reviewed to show how the PSC system works to extract, compress and transmit the semantic information in a task-oriented transmission. Subsequently, the process of how multiuser semantic information can be represented… ▽ More In this paper, the energy efficient design for probabilistic semantic communication (PSC) system with rate splitting multiple access (RSMA) is investigated. Basic principles are first reviewed to show how the PSC system works to extract, compress and transmit the semantic information in a task-oriented transmission. Subsequently, the process of how multiuser semantic information can be represented, compressed and transmitted with RSMA is presented, during which the semantic compression ratio (SCR) is introduced to directly measure the computation overhead in a transmission task, and communication overhead is indirectly described as well. Hence, the problem of wireless resource allocation jointly considering the computation and communication consumption for the PSC system with RSMA is investigated. Both conventional wireless resource constraints and unique constraints on semantic communication are considered to maximize the energy efficiency (EE). Simulation results verify the effectiveness of the proposed scheme. △ Less

Submitted 31 March, 2024; originally announced April 2024.

arXiv:2403.20276 [pdf, other]

Constraints on the Blazar-Boosted Dark Matter from the CDEX-10 Experiment

Authors: R. Xu, L. T. Yang, Q. Yue, K. J. Kang, Y. J. Li, H. P. An, Greeshma C., J. P. Chang, Y. H. Chen, J. P. Cheng, W. H. Dai, Z. Deng, C. H. Fang, X. P. Geng, H. Gong, Q. J. Guo, T. Guo, X. Y. Guo, L. He, S. M. He, J. W. Hu, H. X. Huang, T. C. Huang, L. Jiang, S. Karmakar , et al. (59 additional authors not shown)

Abstract: We report new constraints on light dark matter (DM) boosted by blazars using the 205.4 kg day data from the CDEX-10 experiment located at the China **** Underground Laboratory. Two representative blazars, TXS 0506+56 and BL Lacertae are studied. The results derived from TXS 0506+56 exclude DM-nucleon elastic scattering cross sections from $4.6\times 10^{-33}\ \rm cm^2$ to… ▽ More We report new constraints on light dark matter (DM) boosted by blazars using the 205.4 kg day data from the CDEX-10 experiment located at the China **** Underground Laboratory. Two representative blazars, TXS 0506+56 and BL Lacertae are studied. The results derived from TXS 0506+56 exclude DM-nucleon elastic scattering cross sections from $4.6\times 10^{-33}\ \rm cm^2$ to $1\times10^{-26}\ \rm cm^2$ for DM masses between 10 keV and 1 GeV, and the results derived from BL Lacertae exclude DM-nucleon elastic scattering cross sections from $2.4\times 10^{-34}\ \rm cm^2$ to $1\times10^{-26}\ \rm cm^2$ for the same range of DM masses. The constraints correspond to the best sensitivities among solid-state detector experiments in the sub-MeV mass range. △ Less

Submitted 29 March, 2024; originally announced March 2024.

Comments: 7 pages, 4 figures

Showing 51–100 of 1,564 results for author: Xu, R