Search | arXiv e-print repository

arXiv:2403.19091 [pdf, other]

Observation of the semileptonic decays $D^0\rightarrow K_S^0π^-π^0 e^+ ν_e$ and $D^+\rightarrow K_S^0π^+π^- e^+ ν_e$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann , et al. (600 additional authors not shown)

Abstract: By analyzing $e^+e^-$ annihilation data corresponding to an integrated luminosity of 2.93 $\rm fb^{-1}$ collected at a center-of-mass energy of 3.773 GeV with the \text{BESIII} detector, the first observation of the semileptonic decays $D^0\rightarrow K_S^0π^-π^0 e^+ ν_e$ and $D^+\rightarrow K_S^0π^+π^- e^+ ν_e$ is reported. With a dominant hadronic contribution from $K_1(1270)$, the branching fra… ▽ More By analyzing $e^+e^-$ annihilation data corresponding to an integrated luminosity of 2.93 $\rm fb^{-1}$ collected at a center-of-mass energy of 3.773 GeV with the \text{BESIII} detector, the first observation of the semileptonic decays $D^0\rightarrow K_S^0π^-π^0 e^+ ν_e$ and $D^+\rightarrow K_S^0π^+π^- e^+ ν_e$ is reported. With a dominant hadronic contribution from $K_1(1270)$, the branching fractions are measured to be $\mathcal{B}(D^0\rightarrow {K}_1(1270)^-(\to K^0_Sπ^-π^0)e^+ν_e)=(1.69^{+0.53}_{-0.46}\pm0.15)\times10^{-4}$ and $\mathcal{B}(D^+\to \bar{K}_1(1270)^0(\to K^0_Sπ^+π^-)e^+ν_e)=(1.47^{+0.45}_{-0.40}\pm0.20)\times10^{-4}$ with statistical significance of 5.4$σ$ and 5.6$σ$, respectively. When combined with measurements of the $K_1(1270)\to K^+π^-π$ decays, the absolute branching fractions are determined to be $\mathcal{B}(D^0\to K_1(1270)^-e^+ν_e)=(1.05^{+0.33}_{-0.28}\pm0.12\pm0.12)\times10^{-3}$ and $\mathcal{B}(D^+\to \bar{K}_1(1270)^0e^+ν_e)=(1.29^{+0.40}_{-0.35}\pm0.18\pm0.15)\times10^{-3}$. The first and second uncertainties are statistical and systematic, respectively, and the third uncertainties originate from the assumed branching fractions of the $K_1(1270)\to Kππ$ decays. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: 19pages

arXiv:2403.19066 [pdf, other]

Generative Quanta Color Imaging

Authors: Vishal Purohit, Junjie Luo, Yiheng Chi, Qi Guo, Stanley H. Chan, Qiang Qiu

Abstract: The astonishing development of single-photon cameras has created an unprecedented opportunity for scientific and industrial imaging. However, the high data throughput generated by these 1-bit sensors creates a significant bottleneck for low-power applications. In this paper, we explore the possibility of generating a color image from a single binary frame of a single-photon camera. We evidently fi… ▽ More The astonishing development of single-photon cameras has created an unprecedented opportunity for scientific and industrial imaging. However, the high data throughput generated by these 1-bit sensors creates a significant bottleneck for low-power applications. In this paper, we explore the possibility of generating a color image from a single binary frame of a single-photon camera. We evidently find this problem being particularly difficult to standard colorization approaches due to the substantial degree of exposure variation. The core innovation of our paper is an exposure synthesis model framed under a neural ordinary differential equation (Neural ODE) that allows us to generate a continuum of exposures from a single observation. This innovation ensures consistent exposure in binary images that colorizers take on, resulting in notably enhanced colorization. We demonstrate applications of the method in single-image and burst colorization and show superior generative performance over baselines. Project website can be found at https://vishal-s-p.github.io/projects/2023/generative_quanta_color.html. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: Accepted at IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024

arXiv:2403.18554 [pdf, other]

CosalPure: Learning Concept from Group Images for Robust Co-Saliency Detection

Authors: Jiayi Zhu, Qing Guo, Felix Juefei-Xu, Yihao Huang, Yang Liu, Geguang Pu

Abstract: Co-salient object detection (CoSOD) aims to identify the common and salient (usually in the foreground) regions across a given group of images. Although achieving significant progress, state-of-the-art CoSODs could be easily affected by some adversarial perturbations, leading to substantial accuracy reduction. The adversarial perturbations can mislead CoSODs but do not change the high-level semant… ▽ More Co-salient object detection (CoSOD) aims to identify the common and salient (usually in the foreground) regions across a given group of images. Although achieving significant progress, state-of-the-art CoSODs could be easily affected by some adversarial perturbations, leading to substantial accuracy reduction. The adversarial perturbations can mislead CoSODs but do not change the high-level semantic information (e.g., concept) of the co-salient objects. In this paper, we propose a novel robustness enhancement framework by first learning the concept of the co-salient objects based on the input group images and then leveraging this concept to purify adversarial perturbations, which are subsequently fed to CoSODs for robustness enhancement. Specifically, we propose CosalPure containing two modules, i.e., group-image concept learning and concept-guided diffusion purification. For the first module, we adopt a pre-trained text-to-image diffusion model to learn the concept of co-salient objects within group images where the learned concept is robust to adversarial examples. For the second module, we map the adversarial image to the latent space and then perform diffusion generation by embedding the learned concept into the noise prediction function as an extra condition. Our method can effectively alleviate the influence of the SOTA adversarial attack containing different adversarial patterns, including exposure and noise. The extensive results demonstrate that our method could enhance the robustness of CoSODs significantly. △ Less

Submitted 11 April, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

Comments: This paper is accepted by CVPR 2024

arXiv:2403.17372 [pdf, other]

An Empirical Study of Training ID-Agnostic Multi-modal Sequential Recommenders

Authors: Youhua Li, Hanwen Du, Yongxin Ni, Yuanqi He, Junchen Fu, Xiangyan Liu, Qi Guo

Abstract: Sequential Recommendation (SR) aims to predict future user-item interactions based on historical interactions. While many SR approaches concentrate on user IDs and item IDs, the human perception of the world through multi-modal signals, like text and images, has inspired researchers to delve into constructing SR from multi-modal information without using IDs. However, the complexity of multi-modal… ▽ More Sequential Recommendation (SR) aims to predict future user-item interactions based on historical interactions. While many SR approaches concentrate on user IDs and item IDs, the human perception of the world through multi-modal signals, like text and images, has inspired researchers to delve into constructing SR from multi-modal information without using IDs. However, the complexity of multi-modal learning manifests in diverse feature extractors, fusion methods, and pre-trained models. Consequently, designing a simple and universal \textbf{M}ulti-\textbf{M}odal \textbf{S}equential \textbf{R}ecommendation (\textbf{MMSR}) framework remains a formidable challenge. We systematically summarize the existing multi-modal related SR methods and distill the essence into four core components: visual encoder, text encoder, multimodal fusion module, and sequential architecture. Along these dimensions, we dissect the model designs, and answer the following sub-questions: First, we explore how to construct MMSR from scratch, ensuring its performance either on par with or exceeds existing SR methods without complex techniques. Second, we examine if MMSR can benefit from existing multi-modal pre-training paradigms. Third, we assess MMSR's capability in tackling common challenges like cold start and domain transferring. Our experiment results across four real-world recommendation scenarios demonstrate the great potential ID-agnostic multi-modal sequential recommendation. Our framework can be found at: https://github.com/MMSR23/MMSR. △ Less

Submitted 30 March, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

Comments: An Empirical Study of Training ID-Agnostic Multi-modal Sequential Recommenders

arXiv:2403.17297 [pdf, other]

InternLM2 Technical Report

Authors: Zheng Cai, Maosong Cao, Haojiong Chen, Kai Chen, Keyu Chen, Xin Chen, Xun Chen, Zehui Chen, Zhi Chen, Pei Chu, Xiaoyi Dong, Haodong Duan, Qi Fan, Zhaoye Fei, Yang Gao, Jiaye Ge, Chenya Gu, Yuzhe Gu, Tao Gui, Aijia Guo, Qipeng Guo, Conghui He, Yingfan Hu, Ting Huang, Tao Jiang , et al. (75 additional authors not shown)

Abstract: The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has sparked discussions on the advent of Artificial General Intelligence (AGI). However, replicating such advancements in open-source models has been challenging. This paper introduces InternLM2, an open-source LLM that outperforms its predecessors in comprehensive evaluations across 6 dimensions and 30 benchmarks, long-context m… ▽ More The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has sparked discussions on the advent of Artificial General Intelligence (AGI). However, replicating such advancements in open-source models has been challenging. This paper introduces InternLM2, an open-source LLM that outperforms its predecessors in comprehensive evaluations across 6 dimensions and 30 benchmarks, long-context modeling, and open-ended subjective evaluations through innovative pre-training and optimization techniques. The pre-training process of InternLM2 is meticulously detailed, highlighting the preparation of diverse data types including text, code, and long-context data. InternLM2 efficiently captures long-term dependencies, initially trained on 4k tokens before advancing to 32k tokens in pre-training and fine-tuning stages, exhibiting remarkable performance on the 200k ``Needle-in-a-Haystack" test. InternLM2 is further aligned using Supervised Fine-Tuning (SFT) and a novel Conditional Online Reinforcement Learning from Human Feedback (COOL RLHF) strategy that addresses conflicting human preferences and reward hacking. By releasing InternLM2 models in different training stages and model sizes, we provide the community with insights into the model's evolution. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.16935 [pdf, other]

Measuring Spectral Form Factor in Many-Body Chaotic and Localized Phases of Quantum Processors

Authors: Hang Dong, Pengfei Zhang, Ceren B. Dag, Yu Gao, Ning Wang, **feng Deng, Xu Zhang, Jiachen Chen, Shibo Xu, Ke Wang, Yaozu Wu, Chuanyu Zhang, Feitong **, Xuhao Zhu, Aosai Zhang, Yiren Zou, Ziqi Tan, Zhengyi Cui, Zitian Zhu, Fanhao Shen, Tingting Li, Jiarun Zhong, Zehang Bao, Hekang Li, Zhen Wang , et al. (6 additional authors not shown)

Abstract: The spectral form factor (SFF) captures universal spectral fluctuations as signatures of quantum chaos, and has been instrumental in advancing multiple frontiers of physics including the studies of black holes and quantum many-body systems. However, the measurement of SFF in many-body systems is challenging due to the difficulty in resolving level spacings that become exponentially small with incr… ▽ More The spectral form factor (SFF) captures universal spectral fluctuations as signatures of quantum chaos, and has been instrumental in advancing multiple frontiers of physics including the studies of black holes and quantum many-body systems. However, the measurement of SFF in many-body systems is challenging due to the difficulty in resolving level spacings that become exponentially small with increasing system size. Here we experimentally measure the SFF to probe the presence or absence of chaos in quantum many-body systems using a superconducting quantum processor with a randomized measurement protocol. For a Floquet chaotic system, we observe signatures of spectral rigidity of random matrix theory in SFF given by the ramp-plateau behavior. For a Hamiltonian system, we utilize SFF to distinguish the quantum many-body chaotic phase and the prethermal many-body localization. We observe the dip-ramp-plateau behavior of random matrix theory in the chaotic phase, and contrast the scaling of the plateau time in system size between the many-body chaotic and localized phases. Furthermore, we probe the eigenstate statistics by measuring a generalization of the SFF, known as the partial SFF, and observe distinct behaviors in the purities of the reduced density matrix in the two phases. This work unveils a new way of extracting the universal signatures of many-body quantum chaos in quantum devices by probing the correlations in eigenenergies and eigenstates. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: 12 pages, 9 figures

arXiv:2403.16811 [pdf, ps, other]

Cross section measurement of $e^+e^-\to ηψ(2S)$ and search for $e^+e^-\toη\tilde{X}(3872)$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (639 additional authors not shown)

Abstract: The energy-dependent cross section for $e^+e^-\to ηψ(2S)$ is measured at eighteen center of mass energies from 4.288 GeV to 4.951 GeV using the BESIII detector. Using the same data samples, we also perform the first search for the reaction $e^+e^-\toη\tilde{X}(3872)$, but no evidence is found for the $\tilde{X}(3872)$ in the $π^+π^- J/ψ$ mass distribution. At each of the eighteen center of mass en… ▽ More The energy-dependent cross section for $e^+e^-\to ηψ(2S)$ is measured at eighteen center of mass energies from 4.288 GeV to 4.951 GeV using the BESIII detector. Using the same data samples, we also perform the first search for the reaction $e^+e^-\toη\tilde{X}(3872)$, but no evidence is found for the $\tilde{X}(3872)$ in the $π^+π^- J/ψ$ mass distribution. At each of the eighteen center of mass energies, upper limits at the 90\% confidence level on the cross section for $e^+e^-\toηψ(2S)$ and on the product of the $e^+e^-\toη\tilde{X}(3872)$ cross section with the branching fraction of $\tilde{X}(3872)\toπ^+π^- J/ψ$ are reported. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.16494 [pdf, other]

CT-Bound: Robust Boundary Detection From Noisy Images Via Hybrid Convolution and Transformer Neural Networks

Authors: Wei Xu, Junjie Luo, Qi Guo

Abstract: We present CT-Bound, a robust and fast boundary detection method for very noisy images using a hybrid Convolution and Transformer neural network. The proposed architecture decomposes boundary estimation into two tasks: local detection and global regularization. During the local detection, the model uses a convolutional architecture to predict the boundary structure of each image patch in the form… ▽ More We present CT-Bound, a robust and fast boundary detection method for very noisy images using a hybrid Convolution and Transformer neural network. The proposed architecture decomposes boundary estimation into two tasks: local detection and global regularization. During the local detection, the model uses a convolutional architecture to predict the boundary structure of each image patch in the form of a pre-defined local boundary representation, the field-of-junctions (FoJ). Then, it uses a feed-forward transformer architecture to globally refine the boundary structures of each patch to generate an edge map and a smoothed color map simultaneously. Our quantitative analysis shows that CT-Bound outperforms the previous best algorithms in edge detection on very noisy images. It also increases the edge detection accuracy of FoJ-based methods while having a 3-time speed improvement. Finally, we demonstrate that CT-Bound can produce boundary and color maps on real captured images without extra fine-tuning and real-time boundary map and color map videos at ten frames per second. △ Less

Submitted 25 June, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.14998 [pdf, other]

Precise measurement of the $e^+e^-\to D_s^+D_s^-$ cross sections at center-of-mass energies from threshold to 4.95 GeV

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (638 additional authors not shown)

Abstract: Using the $e^+e^-$ collision data collected with the BESIII detector operating at the BEPCII collider, at center-of-mass energies from the threshold to $4.95$~GeV, we present precise measurements of the cross sections for the process $e^+e^-\to D_s^+D_s^-$ using a single tag method. The resulting cross section lineshape exhibits several new structures, thereby offering an input for coupled channel… ▽ More Using the $e^+e^-$ collision data collected with the BESIII detector operating at the BEPCII collider, at center-of-mass energies from the threshold to $4.95$~GeV, we present precise measurements of the cross sections for the process $e^+e^-\to D_s^+D_s^-$ using a single tag method. The resulting cross section lineshape exhibits several new structures, thereby offering an input for coupled channel analysis and model tests, which are critical to understand vector charmonium-like states with masses between 4 and 5~GeV. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: 9 pages, 4 figures, published to PRL

arXiv:2403.14778 [pdf, other]

Diffusion Attack: Leveraging Stable Diffusion for Naturalistic Image Attacking

Authors: Qianyu Guo, Jiaming Fu, Yawen Lu, Dongming Gan

Abstract: In Virtual Reality (VR), adversarial attack remains a significant security threat. Most deep learning-based methods for physical and digital adversarial attacks focus on enhancing attack performance by crafting adversarial examples that contain large printable distortions that are easy for human observers to identify. However, attackers rarely impose limitations on the naturalness and comfort of t… ▽ More In Virtual Reality (VR), adversarial attack remains a significant security threat. Most deep learning-based methods for physical and digital adversarial attacks focus on enhancing attack performance by crafting adversarial examples that contain large printable distortions that are easy for human observers to identify. However, attackers rarely impose limitations on the naturalness and comfort of the appearance of the generated attack image, resulting in a noticeable and unnatural attack. To address this challenge, we propose a framework to incorporate style transfer to craft adversarial inputs of natural styles that exhibit minimal detectability and maximum natural appearance, while maintaining superior attack capabilities. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Comments: Accepted to IEEE VRW

arXiv:2403.14734 [pdf, other]

A Survey of Neural Code Intelligence: Paradigms, Advances and Beyond

Authors: Qiushi Sun, Zhirui Chen, Fangzhi Xu, Kanzhi Cheng, Chang Ma, Zhangyue Yin, Jianing Wang, Chengcheng Han, Renyu Zhu, Shuai Yuan, Qipeng Guo, Xipeng Qiu, Pengcheng Yin, Xiaoli Li, Fei Yuan, Lingpeng Kong, Xiang Li, Zhiyong Wu

Abstract: Neural Code Intelligence -- leveraging deep learning to understand, generate, and optimize code -- holds immense potential for transformative impacts on the whole society. Bridging the gap between Natural Language and Programming Language, this domain has drawn significant attention from researchers in both research communities over the past few years. This survey presents a systematic and chronol… ▽ More Neural Code Intelligence -- leveraging deep learning to understand, generate, and optimize code -- holds immense potential for transformative impacts on the whole society. Bridging the gap between Natural Language and Programming Language, this domain has drawn significant attention from researchers in both research communities over the past few years. This survey presents a systematic and chronological review of the advancements in code intelligence, encompassing over 50 representative models and their variants, more than 20 categories of tasks, and an extensive coverage of over 680 related works. We follow the historical progression to trace the paradigm shifts across different research phases (e.g., from modeling code with recurrent neural networks to the era of Large Language Models). Concurrently, we highlight the major technical transitions in models, tasks, and evaluations spanning through different stages. For applications, we also observe a co-evolving shift. It spans from initial endeavors to tackling specific scenarios, through exploring a diverse array of tasks during its rapid expansion, to currently focusing on tackling increasingly complex and varied real-world challenges. Building on our examination of the developmental trajectories, we further investigate the emerging synergies between code intelligence and broader machine intelligence, uncovering new cross-domain opportunities and illustrating the substantial influence of code intelligence across various domains. Finally, we delve into both the opportunities and challenges associated with this field, alongside elucidating our insights on the most promising research directions. An ongoing, dynamically updated project and resources associated with this survey have been released at https://github.com/QiushiSun/NCISurvey. △ Less

Submitted 23 June, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

Comments: 64 pages, 6 figures, 10 tables, 692 references

arXiv:2403.13437 [pdf, other]

Search for $ΔS=2$ nonleptonic hyperon decays $Ω^-\toΣ^{0}π^{-}$ and $Ω^-\to nK^{-}$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (638 additional authors not shown)

Abstract: Using $(27.12 \pm 0.14) \times 10^{8}$ $ψ(3686)$ events collected by the BESIII detector at the center-of-mass energy of $\sqrt{s} = 3.686$ GeV, we search for the first time for two nonleptonic hyperon decays that change strangeness by two units, $Ω^-\toΣ^{0}π^-$ and $Ω^-\to nK^{-}$. No significant signal is observed. The upper limits on their decay branching fractions are determined to be… ▽ More Using $(27.12 \pm 0.14) \times 10^{8}$ $ψ(3686)$ events collected by the BESIII detector at the center-of-mass energy of $\sqrt{s} = 3.686$ GeV, we search for the first time for two nonleptonic hyperon decays that change strangeness by two units, $Ω^-\toΣ^{0}π^-$ and $Ω^-\to nK^{-}$. No significant signal is observed. The upper limits on their decay branching fractions are determined to be $\mathcal{B}(Ω^-\toΣ^{0}π^-) < 5.4\times 10^{-4}$ and $\mathcal{B}(Ω^-\to nK^{-}) < 2.4\times 10^{-4}$ at the $90\%$ confidence level. △ Less

Submitted 14 April, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

arXiv:2403.12766 [pdf, other]

NovelQA: Benchmarking Question Answering on Documents Exceeding 200K Tokens

Authors: Cunxiang Wang, Ruoxi Ning, Boqi Pan, Tonghui Wu, Qipeng Guo, Cheng Deng, Guangsheng Bao, Xiangkun Hu, Zheng Zhang, Qian Wang, Yue Zhang

Abstract: The rapid advancement of Large Language Models (LLMs) has introduced a new frontier in natural language processing, particularly in understanding and processing long-context information. However, the evaluation of these models' long-context abilities remains a challenge due to the limitations of current benchmarks. To address this gap, we introduce NovelQA, a benchmark specifically designed to tes… ▽ More The rapid advancement of Large Language Models (LLMs) has introduced a new frontier in natural language processing, particularly in understanding and processing long-context information. However, the evaluation of these models' long-context abilities remains a challenge due to the limitations of current benchmarks. To address this gap, we introduce NovelQA, a benchmark specifically designed to test the capabilities of LLMs with extended texts. Constructed from English novels, NovelQA offers a unique blend of complexity, length, and narrative coherence, making it an ideal tool for assessing deep textual understanding in LLMs. This paper presents the design and construction of NovelQA, highlighting its manual annotation, and diverse question types. Our evaluation of Long-context LLMs on NovelQA reveals significant insights into the models' performance, particularly emphasizing the challenges they face with multi-hop reasoning, detail-oriented questions, and extremely long input with an average length more than 200,000 tokens. The results underscore the necessity for further advancements in LLMs to improve their long-context comprehension. △ Less

Submitted 17 June, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.12445 [pdf, other]

Boosting Transferability in Vision-Language Attacks via Diversification along the Intersection Region of Adversarial Trajectory

Authors: Sensen Gao, Xiaojun Jia, Xuhong Ren, Ivor Tsang, Qing Guo

Abstract: Vision-language pre-training (VLP) models exhibit remarkable capabilities in comprehending both images and text, yet they remain susceptible to multimodal adversarial examples (AEs). Strengthening adversarial attacks and uncovering vulnerabilities, especially common issues in VLP models (e.g., high transferable AEs), can stimulate further research on constructing reliable and practical VLP models.… ▽ More Vision-language pre-training (VLP) models exhibit remarkable capabilities in comprehending both images and text, yet they remain susceptible to multimodal adversarial examples (AEs). Strengthening adversarial attacks and uncovering vulnerabilities, especially common issues in VLP models (e.g., high transferable AEs), can stimulate further research on constructing reliable and practical VLP models. A recent work (i.e., Set-level guidance attack) indicates that augmenting image-text pairs to increase AE diversity along the optimization path enhances the transferability of adversarial examples significantly. However, this approach predominantly emphasizes diversity around the online adversarial examples (i.e., AEs in the optimization period), leading to the risk of overfitting the victim model and affecting the transferability. In this study, we posit that the diversity of adversarial examples towards the clean input and online AEs are both pivotal for enhancing transferability across VLP models. Consequently, we propose using diversification along the intersection region of adversarial trajectory to expand the diversity of AEs. To fully leverage the interaction between modalities, we introduce text-guided adversarial example selection during optimization. Furthermore, to further mitigate the potential overfitting, we direct the adversarial text deviating from the last intersection region along the optimization path, rather than adversarial images as in existing methods. Extensive experiments affirm the effectiveness of our method in improving transferability across various VLP models and downstream vision-and-language tasks (e.g., Image-Text Retrieval(ITR), Visual Grounding(VG), Image Captioning(IC)). △ Less

Submitted 19 March, 2024; originally announced March 2024.

arXiv:2403.11935 [pdf, other]

doi 10.1364/OE.508017

HyperColorization: Propagating spatially sparse noisy spectral clues for reconstructing hyperspectral images

Authors: M. Kerem Aydin, Qi Guo, Emma Alexander

Abstract: Hyperspectral cameras face challenging spatial-spectral resolution trade-offs and are more affected by shot noise than RGB photos taken over the same total exposure time. Here, we present a colorization algorithm to reconstruct hyperspectral images from a grayscale guide image and spatially sparse spectral clues. We demonstrate that our algorithm generalizes to varying spectral dimensions for hype… ▽ More Hyperspectral cameras face challenging spatial-spectral resolution trade-offs and are more affected by shot noise than RGB photos taken over the same total exposure time. Here, we present a colorization algorithm to reconstruct hyperspectral images from a grayscale guide image and spatially sparse spectral clues. We demonstrate that our algorithm generalizes to varying spectral dimensions for hyperspectral images, and show that colorizing in a low-rank space reduces compute time and the impact of shot noise. To enhance robustness, we incorporate guided sampling, edge-aware filtering, and dimensionality estimation techniques. Our method surpasses previous algorithms in various performance metrics, including SSIM, PSNR, GFC, and EMD, which we analyze as metrics for characterizing hyperspectral image quality. Collectively, these findings provide a promising avenue for overcoming the time-space-wavelength resolution trade-off by reconstructing a dense hyperspectral image from samples obtained by whisk or push broom scanners, as well as hybrid spatial-spectral computational imaging systems. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: 16 Pages, 13 Figures, 3 Tables, for more information: https://mehmetkeremaydin.github.io/hypercolorization/

ACM Class: I.4.5

Journal ref: Optics Express, Vol:7, year:2024, p:10761-10776

arXiv:2403.10877 [pdf, ps, other]

Test of lepton universality and measurement of the form factors of $D^0\to K^{*}(892)^-μ^+ν_μ$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (637 additional authors not shown)

Abstract: We report a first study of the semileptonic decay $D^0\rightarrow K^-π^0μ^{+}ν_μ$ by analyzing an $e^+e^-$ annihilation data sample of $7.9~\mathrm{fb}^{-1}$ collected at the center-of-mass energy of 3.773 GeV with the BESIII detector. The absolute branching fraction of $D^0\to K^-π^0μ^{+}ν_μ$ is measured for the first time to be $(0.729 \pm 0.014_{\rm stat} \pm 0.011_{\rm syst})\%$. Based on an a… ▽ More We report a first study of the semileptonic decay $D^0\rightarrow K^-π^0μ^{+}ν_μ$ by analyzing an $e^+e^-$ annihilation data sample of $7.9~\mathrm{fb}^{-1}$ collected at the center-of-mass energy of 3.773 GeV with the BESIII detector. The absolute branching fraction of $D^0\to K^-π^0μ^{+}ν_μ$ is measured for the first time to be $(0.729 \pm 0.014_{\rm stat} \pm 0.011_{\rm syst})\%$. Based on an amplitude analysis, the $S\text{-}{\rm wave}$ contribution is determined to be $(5.76 \pm 0.35_{\rm stat} \pm 0.29_{\rm syst})\%$ of the total decay rate in addition to the dominated $K^{*}(892)^-$ component. The branching fraction of $D^0\to K^{*}(892)^-μ^+ν_μ$ is given to be $(2.062 \pm 0.039_{\rm stat} \pm 0.032_{\rm syst})\%$, which improves the precision of the world average by a factor of 5. Combining with the world average of ${\mathcal B}(D^0\to K^{*}(892)^-e^+ν_e)$, the ratio of the branching fractions obtained is $\frac{{\mathcal B}(D^0\to K^{*}(892)^-μ^+ν_μ)}{{\mathcal B}(D^0\to K^{*}(892)^-e^+ν_e)} = 0.96\pm0.08$, in agreement with lepton flavor universality. Furthermore, assuming single-pole dominance parameterization, the most precise hadronic form factor ratios for $D^0\to K^{*}(892)^{-} μ^+ν_μ$ are extracted to be $r_{V}=V(0)/A_1(0)=1.37 \pm 0.09_{\rm stat} \pm 0.03_{\rm syst}$ and $r_{2}=A_2(0)/A_1(0)=0.76 \pm 0.06_{\rm stat} \pm 0.02_{\rm syst}$. △ Less

Submitted 16 March, 2024; originally announced March 2024.

Comments: 9 pages, 3 figures

arXiv:2403.10010 [pdf, other]

doi 10.1103/PhysRevLett.132.131002

Measurements of All-Particle Energy Spectrum and Mean Logarithmic Mass of Cosmic Rays from 0.3 to 30 PeV with LHAASO-KM2A

Authors: The LHAASO Collaboration, Zhen Cao, F. Aharonian, Q. An, A. Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen , et al. (256 additional authors not shown)

Abstract: We present the measurements of all-particle energy spectrum and mean logarithmic mass of cosmic rays in the energy range of 0.3-30 PeV using data collected from LHAASO-KM2A between September 2021 and December 2022, which is based on a nearly composition-independent energy reconstruction method, achieving unprecedented accuracy. Our analysis reveals the position of the knee at… ▽ More We present the measurements of all-particle energy spectrum and mean logarithmic mass of cosmic rays in the energy range of 0.3-30 PeV using data collected from LHAASO-KM2A between September 2021 and December 2022, which is based on a nearly composition-independent energy reconstruction method, achieving unprecedented accuracy. Our analysis reveals the position of the knee at $3.67 \pm 0.05 \pm 0.15$ PeV. Below the knee, the spectral index is found to be -$2.7413 \pm 0.0004 \pm 0.0050$, while above the knee, it is -$3.128 \pm 0.005 \pm 0.027$, with the sharpness of the transition measured with a statistical error of 2%. The mean logarithmic mass of cosmic rays is almost heavier than helium in the whole measured energy range. It decreases from 1.7 at 0.3 PeV to 1.3 at 3 PeV, representing a 24% decline following a power law with an index of -$0.1200 \pm 0.0003 \pm 0.0341$. This is equivalent to an increase in abundance of light components. Above the knee, the mean logarithmic mass exhibits a power law trend towards heavier components, which is reversal to the behavior observed in the all-particle energy spectrum. Additionally, the knee position and the change in power-law index are approximately the same. These findings suggest that the knee observed in the all-particle spectrum corresponds to the knee of the light component, rather than the medium-heavy components. △ Less

Submitted 26 March, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

Comments: 8 pages, 3 figures

Journal ref: Physical Review Letters 132, 131002 (2024)

arXiv:2403.06766 [pdf, other]

Determination of the number of $ψ(3686)$ events taken at BESIII

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (639 additional authors not shown)

Abstract: The number of $ψ(3686)$ events collected by the BESIII detector during the 2021 run period is determined to be $(2259.3\pm 11.1)\times 10^6$ by counting inclusive $ψ(3686)$ hadronic events. The uncertainty is systematic and the statistical uncertainty is negligible. Meanwhile, the numbers of $ψ(3686)$ events collected during the 2009 and 2012 run periods are updated to be… ▽ More The number of $ψ(3686)$ events collected by the BESIII detector during the 2021 run period is determined to be $(2259.3\pm 11.1)\times 10^6$ by counting inclusive $ψ(3686)$ hadronic events. The uncertainty is systematic and the statistical uncertainty is negligible. Meanwhile, the numbers of $ψ(3686)$ events collected during the 2009 and 2012 run periods are updated to be $(107.7\pm0.6)\times 10^6$ and $(345.4\pm 2.6)\times 10^6$, respectively. Both numbers are consistent with the previous measurements within one standard deviation. The total number of $ψ(3686)$ events in the three data samples is $(2712.4\pm14.3)\times10^6$. △ Less

Submitted 28 May, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

arXiv:2403.05248 [pdf, other]

doi 10.3847/2041-8213/ad4986

JWST's first glimpse of a z > 2 forming cluster reveals a top-heavy stellar mass function

Authors: Hanwen Sun, Tao Wang, Ke Xu, Emanuele Daddi, Qing Gu, Tadayuki Kodama, Anita Zanella, David Elbaz, Ichi Tanaka, Raphael Gobat, Qi Guo, Jiaxin Han, Shiying Lu, Luwenjia Zhou

Abstract: Clusters and their progenitors (protoclusters) at z = 2-4, the peak epoch of star formation, are ideal laboratories to study the formation process of both the clusters themselves and their member galaxies. However, a complete census of their member galaxies has been challenging due to observational difficulties. Here we present new JWST/NIRCam observations targeting the distant cluster CLJ1001 at… ▽ More Clusters and their progenitors (protoclusters) at z = 2-4, the peak epoch of star formation, are ideal laboratories to study the formation process of both the clusters themselves and their member galaxies. However, a complete census of their member galaxies has been challenging due to observational difficulties. Here we present new JWST/NIRCam observations targeting the distant cluster CLJ1001 at z = 2.51 from the COSMOS-Web program, which, in combination with previous narrowband imaging targeting H-alpha emitters and deep millimeter surveys of CO emitters, provide a complete view of massive galaxy assembly in CLJ1001. In particular, JWST reveals a population of massive, extremely red cluster members in the long-wavelength bands that were invisible in previous Hubble Space Telescope (HST)/F160W imaging (HST-dark members). Based on this highly complete spectroscopic sample of member galaxies, we show that the spatial distribution of galaxies in CLJ1001 exhibits a strong central concentration, with the central galaxy density already resembling that of low-z clusters. Moreover, we reveal a "top-heavy" stellar mass function for the star-forming galaxies (SFGs), with an overabundance of massive SFGs piled up in the cluster core. These features strongly suggest that CLJ1001 is caught in a rapid transition, with many of its massive SFGs likely soon becoming quiescent. In the context of cluster formation, these findings suggest that the earliest clusters form from the inside out and top to bottom, with the massive galaxies in the core assembling first, followed by the less massive ones in the outskirts. △ Less

Submitted 29 May, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

Comments: 14 pages, 9 figures, 1 table, published by ApJL

Journal ref: ApJL, 967, L34 (2024)

arXiv:2403.03579 [pdf, other]

Testing the unified bounds of quantum speed limit

Authors: Yaozu Wu, Jiale Yuan, Chuanyu Zhang, Zitian Zhu, **feng Deng, Xu Zhang, Pengfei Zhang, Qiujiang Guo, Zhen Wang, Jiehui Huang, Chao Song, Hekang Li, Da-Wei Wang, H. Wang, Girish S. Agarwal

Abstract: Quantum speed limits (QSLs) impose fundamental constraints on the evolution speed of quantum systems. Traditionally, the Mandelstam-Tamm (MT) and Margolus-Levitin (ML) bounds have been widely employed, relying on the standard deviation and mean of energy distribution to define the QSLs. However, these universal bounds only offer loose restrictions on the quantum evolution. Here we introduce the ge… ▽ More Quantum speed limits (QSLs) impose fundamental constraints on the evolution speed of quantum systems. Traditionally, the Mandelstam-Tamm (MT) and Margolus-Levitin (ML) bounds have been widely employed, relying on the standard deviation and mean of energy distribution to define the QSLs. However, these universal bounds only offer loose restrictions on the quantum evolution. Here we introduce the generalized ML bounds, which prove to be more stringent in constraining dynamic evolution, by utilizing moments of energy spectra of arbitrary orders, even noninteger orders. To validate our findings, we conduct experiments in a superconducting circuit, where we have the capability to prepare a wide range of quantum photonic states and rigorously test these bounds by measuring the evolution of the system and its photon statistics using quantum state tomography. While, in general, the MT bound is effective for short-time evolution, we identify specific parameter regimes where either the MT or the generalized ML bounds suffice to constrain the entire evolution. Our findings not only establish new criteria for estimating QSLs but also substantially enhance our comprehension of the dynamic evolution of quantum systems. △ Less

Submitted 6 March, 2024; originally announced March 2024.

arXiv:2403.03558 [pdf, other]

Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word Problem

Authors: Yuhong Sun, Zhangyue Yin, Qipeng Guo, Jiawen Wu, Xipeng Qiu, Hui Zhao

Abstract: Large language models (LLMs) are highly effective in various natural language processing (NLP) tasks. However, they are susceptible to producing unreliable conjectures in ambiguous contexts called hallucination. This paper presents a new method for evaluating LLM hallucination in Question Answering (QA) based on the unanswerable math word problem (MWP). To support this approach, we innovatively de… ▽ More Large language models (LLMs) are highly effective in various natural language processing (NLP) tasks. However, they are susceptible to producing unreliable conjectures in ambiguous contexts called hallucination. This paper presents a new method for evaluating LLM hallucination in Question Answering (QA) based on the unanswerable math word problem (MWP). To support this approach, we innovatively develop a dataset called Unanswerable Math Word Problem (UMWP) which comprises 5200 questions across five categories. We developed an evaluation methodology combining text similarity and mathematical expression detection to determine whether LLM considers the question unanswerable. The results of extensive experiments conducted on 31 LLMs, including GPT-3, InstructGPT, LLaMA, and Claude, demonstrate that in-context learning and reinforcement learning with human feedback (RLHF) training significantly enhance the model's ability to avoid hallucination. We show that utilizing MWP is a reliable and effective approach to assess hallucination. Our code and data are available at https://github.com/Yuki-Asuuna/UMWP. △ Less

Submitted 6 March, 2024; originally announced March 2024.

Comments: 11 pages, 8 figures, accepted by LREC-Coling 2024

arXiv:2403.03500 [pdf, other]

Observation of the decay $h_{c}\to3(π^{+}π^{-})π^{0}$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (639 additional authors not shown)

Abstract: Based on $(2712.4\pm14.1)\times10^{6}$ $ψ(3686)$ events collected with the BESIII detector, we study the decays $h_{c}\to3(π^{+}π^{-})π^{0}$, $h_{c}\to2(π^{+}π^{-})ω$, $h_{c}\to2(π^{+}π^{-})π^{0}η$, $h_{c}\to2(π^{+}π^{-})η$, and $h_{c}\to p\bar{p}$ via $ψ(3686)\toπ^{0}h_{c}$. The decay channel $h_{c}\to3(π^{+}π^{-})π^{0}$ is observed for the first time, and its branching fraction is determined to… ▽ More Based on $(2712.4\pm14.1)\times10^{6}$ $ψ(3686)$ events collected with the BESIII detector, we study the decays $h_{c}\to3(π^{+}π^{-})π^{0}$, $h_{c}\to2(π^{+}π^{-})ω$, $h_{c}\to2(π^{+}π^{-})π^{0}η$, $h_{c}\to2(π^{+}π^{-})η$, and $h_{c}\to p\bar{p}$ via $ψ(3686)\toπ^{0}h_{c}$. The decay channel $h_{c}\to3(π^{+}π^{-})π^{0}$ is observed for the first time, and its branching fraction is determined to be $\left( {9.28\pm 1.14 \pm 0.77} \right) \times {10^{ - 3}}$, where the first uncertainty is statistical and the second is systematic. In addition, first evidence is found for the modes $h_{c} \to 2(π^{+}π^{-})π^{0}η$ and $h_{c}\to2(π^{+}π^{-})ω$ with significances of 4.8$σ$ and 4.7$σ$, and their branching fractions are determined to be $(7.55\pm1.51\pm0.77)\times10^{-3}$ and $\left( {4.00 \pm 0.86 \pm 0.35}\right) \times {10^{ - 3}}$, respectively. No significant signals of $h_c\to 2(π^+π^-)η$ and $h_{c}\to p\bar{p}$ are observed, and the upper limits of the branching fractions of these decays are determined to be $<6.19\times10^{-4}$ and $<4.40\times10^{-5}$ at the 90% confidence level, respectively. △ Less

Submitted 6 March, 2024; originally announced March 2024.

Comments: 11 pages, 3 figures

arXiv:2403.02330 [pdf, other]

RegionGPT: Towards Region Understanding Vision Language Model

Authors: Qiushan Guo, Shalini De Mello, Hongxu Yin, Wonmin Byeon, Ka Chun Cheung, Yizhou Yu, ** Luo, Sifei Liu

Abstract: Vision language models (VLMs) have experienced rapid advancements through the integration of large language models (LLMs) with image-text pairs, yet they struggle with detailed regional visual understanding due to limited spatial awareness of the vision encoder, and the use of coarse-grained training data that lacks detailed, region-specific captions. To address this, we introduce RegionGPT (short… ▽ More Vision language models (VLMs) have experienced rapid advancements through the integration of large language models (LLMs) with image-text pairs, yet they struggle with detailed regional visual understanding due to limited spatial awareness of the vision encoder, and the use of coarse-grained training data that lacks detailed, region-specific captions. To address this, we introduce RegionGPT (short as RGPT), a novel framework designed for complex region-level captioning and understanding. RGPT enhances the spatial awareness of regional representation with simple yet effective modifications to existing visual encoders in VLMs. We further improve performance on tasks requiring a specific output scope by integrating task-guided instruction prompts during both training and inference phases, while maintaining the model's versatility for general-purpose tasks. Additionally, we develop an automated region caption data generation pipeline, enriching the training set with detailed region-level captions. We demonstrate that a universal RGPT model can be effectively applied and significantly enhancing performance across a range of region-level tasks, including but not limited to complex region descriptions, reasoning, object classification, and referring expressions comprehension. △ Less

Submitted 4 March, 2024; originally announced March 2024.

Comments: Accepted by CVPR 2024

arXiv:2403.01971 [pdf, other]

ContrastRepair: Enhancing Conversation-Based Automated Program Repair via Contrastive Test Case Pairs

Authors: Jiaolong Kong, Mingfei Cheng, Xiaofei Xie, Shangqing Liu, Xiaoning Du, Qi Guo

Abstract: Automated Program Repair (APR) aims to automatically generate patches for rectifying software bugs. Recent strides in Large Language Models (LLM), such as ChatGPT, have yielded encouraging outcomes in APR, especially within the conversation-driven APR framework. Nevertheless, the efficacy of conversation-driven APR is contingent on the quality of the feedback information. In this paper, we propose… ▽ More Automated Program Repair (APR) aims to automatically generate patches for rectifying software bugs. Recent strides in Large Language Models (LLM), such as ChatGPT, have yielded encouraging outcomes in APR, especially within the conversation-driven APR framework. Nevertheless, the efficacy of conversation-driven APR is contingent on the quality of the feedback information. In this paper, we propose ContrastRepair, a novel conversation-based APR approach that augments conversation-driven APR by providing LLMs with contrastive test pairs. A test pair consists of a failing test and a passing test, which offer contrastive feedback to the LLM. Our key insight is to minimize the difference between the generated passing test and the given failing test, which can better isolate the root causes of bugs. By providing informative and specific feedback, ContrastRepair enables the LLM to produce effective bug fixes. The implementation of ContrastRepair is based on the state-of-the-art LLM, ChatGPT, and it iteratively interacts with ChatGPT until plausible patches are generated. We evaluate ContrastRepair on multiple benchmark datasets, including Defects4j, QuixBugs, and HumanEval-Java. The results demonstrate that ContrastRepair significantly outperforms existing methods, achieving a new state-of-the-art in program repair. For instance, among Defects4j 1.2 and 2.0, ContrastRepair correctly repairs 143 out of all 337 bug cases, while the best-performing baseline fixes 124 bugs. △ Less

Submitted 7 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

arXiv:2403.01761 [pdf, other]

Observation of $ψ(3686)\to 3φ$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (645 additional authors not shown)

Abstract: Using $(2.712\pm0.014)\times 10^9$ $ψ(3686)$ events collected by the BESIII detector operating at the BEPCII collider, we report the first observation of $ψ(3686)\to 3φ$ decay with a significance larger than 10$σ$. The branching fraction of this decay is determined to be $(1.46\pm0.05\pm0.17)\times10^{-5}$, where the first uncertainty is statistical and the second is systematic. No significant str… ▽ More Using $(2.712\pm0.014)\times 10^9$ $ψ(3686)$ events collected by the BESIII detector operating at the BEPCII collider, we report the first observation of $ψ(3686)\to 3φ$ decay with a significance larger than 10$σ$. The branching fraction of this decay is determined to be $(1.46\pm0.05\pm0.17)\times10^{-5}$, where the first uncertainty is statistical and the second is systematic. No significant structure is observed in the $φφ$ invariant mass spectra. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2403.00758 [pdf, other]

Mitigating Reversal Curse in Large Language Models via Semantic-aware Permutation Training

Authors: Qingyan Guo, Rui Wang, Junliang Guo, Xu Tan, Jiang Bian, Yujiu Yang

Abstract: While large language models (LLMs) have achieved impressive performance across diverse tasks, recent studies showcase that causal LLMs suffer from the "reversal curse". It is a typical example that the model knows "A's father is B", but is unable to reason "B's child is A". This limitation poses a challenge to the advancement of artificial general intelligence (AGI), as it suggests a gap in the mo… ▽ More While large language models (LLMs) have achieved impressive performance across diverse tasks, recent studies showcase that causal LLMs suffer from the "reversal curse". It is a typical example that the model knows "A's father is B", but is unable to reason "B's child is A". This limitation poses a challenge to the advancement of artificial general intelligence (AGI), as it suggests a gap in the models' ability to comprehend and apply bidirectional reasoning. In this paper, we first conduct substantial evaluation and identify that the root cause of the reversal curse lies in the different word order between the training and inference stage, namely, the poor ability of causal language models to predict antecedent words within the training data. Accordingly, permutation on the training data is considered as a potential solution, since this can make the model predict antecedent words or tokens. However, previous permutation methods may disrupt complete phrases or entities, thereby posing challenges for the model to comprehend and learn from training data. To address this issue, we propose Semantic-aware Permutation Training (SPT), which addresses this issue by segmenting the training sentences into semantic units (i.e., entities or phrases) with an assistant language model and permuting these units before feeding into the model. Extensive experiments demonstrate that SPT effectively mitigates the reversal curse since the performance on reversed questions approximates that on the forward ones, and significantly advances the performance of existing works. △ Less

Submitted 20 March, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

arXiv:2402.16319 [pdf, other]

Data-freeWeight Compress and Denoise for Large Language Models

Authors: Runyu Peng, Yunhua Zhou, Qipeng Guo, Yang Gao, Hang Yan, Xipeng Qiu, Dahua Lin

Abstract: Large Language Models (LLMs) are resha** the research landscape in artificial intelligence, particularly as model parameters scale up significantly, unlocking remarkable capabilities across various domains. Nevertheless, the scalability of model parameters faces constraints due to limitations in GPU memory and computational speed. To address these constraints, various weight compression methods… ▽ More Large Language Models (LLMs) are resha** the research landscape in artificial intelligence, particularly as model parameters scale up significantly, unlocking remarkable capabilities across various domains. Nevertheless, the scalability of model parameters faces constraints due to limitations in GPU memory and computational speed. To address these constraints, various weight compression methods have emerged, such as Pruning and Quantization. Given the low-rank nature of weight matrices in language models, the reduction of weights through matrix decomposition undoubtedly holds significant potential and promise. In this paper, drawing upon the intrinsic structure of LLMs, we propose a novel approach termed Data-free Joint Rank-k Approximation for compressing the parameter matrices. Significantly, our method is characterized by without necessitating additional involvement of any corpus, while simultaneously preserving orthogonality in conjunction with pruning and quantization methods. We achieve a model pruning of 80% parameters while retaining 93.43% of the original performance without any calibration data. Additionally, we explore the fundamental properties of the weight matrix of LLMs undergone Rank-k Approximation and conduct comprehensive experiments to elucidate our hypothesis. △ Less

Submitted 26 February, 2024; originally announced February 2024.

arXiv:2402.16088 [pdf]

Origin of giant magnetoresistance in layered nodal-line semimetal TaNiTe5 nanoflakes

Authors: Ding-Bang Zhou, Kuang-Hong Gao, Meng-Fan Zhao, Zhi-Yan Jia, Xiao-Xia Hu, Qian-** Guo, Hai-Yan Du, Xiao-** Chen, Zhi-Qing Li

Abstract: Layered transition metal chalcogenides have stimulated a wide research interest due to their many exotic physical properties. In this paper, we studied the magnetotransport properties of the exfoliated TaNiTe5, a recently discovered Dirac nodal-line semimetal. A giant positive magnetoresistance (MR) is observed when the current is parallel to the crystallographic c axis, while it is strongly dimin… ▽ More Layered transition metal chalcogenides have stimulated a wide research interest due to their many exotic physical properties. In this paper, we studied the magnetotransport properties of the exfoliated TaNiTe5, a recently discovered Dirac nodal-line semimetal. A giant positive magnetoresistance (MR) is observed when the current is parallel to the crystallographic c axis, while it is strongly diminished when the current flows along the a axis. The observed giant MR is gradually suppressed either on reducing the thickness of nanoflake or on increasing temperature. By performing MR measurement in tilted magnetic fields, the interlayer coupling is found to be weakened both by reducing the thickness and by increasing temperature. We propose a mechanism of electron-electron interaction-assisted interlayer transport as a origin of the giant MR. The mechanism is likely to provide a explanation for the giant MR in other layered materials. △ Less

Submitted 18 May, 2024; v1 submitted 25 February, 2024; originally announced February 2024.

Comments: 21 pages, 7 figures, 1 table

arXiv:2402.15283 [pdf, other]

When in Doubt, Think Slow: Iterative Reasoning with Latent Imagination

Authors: Martin Benfeghoul, Umais Zahid, Qinghai Guo, Zafeirios Fountas

Abstract: In an unfamiliar setting, a model-based reinforcement learning agent can be limited by the accuracy of its world model. In this work, we present a novel, training-free approach to improving the performance of such agents separately from planning and learning. We do so by applying iterative inference at decision-time, to fine-tune the inferred agent states based on the coherence of future state rep… ▽ More In an unfamiliar setting, a model-based reinforcement learning agent can be limited by the accuracy of its world model. In this work, we present a novel, training-free approach to improving the performance of such agents separately from planning and learning. We do so by applying iterative inference at decision-time, to fine-tune the inferred agent states based on the coherence of future state representations. Our approach achieves a consistent improvement in both reconstruction accuracy and task performance when applied to visual 3D navigation tasks. We go on to show that considering more future states further improves the performance of the agent in partially-observable environments, but not in a fully-observable one. Finally, we demonstrate that agents with less training pre-evaluation benefit most from our approach. △ Less

Submitted 23 February, 2024; originally announced February 2024.

ACM Class: I.2.0; I.2.8; I.2.10; I.4.5; I.4.10

arXiv:2402.14845 [pdf, other]

Purifying Large Language Models by Ensembling a Small Language Model

Authors: Tianlin Li, Qian Liu, Tianyu Pang, Chao Du, Qing Guo, Yang Liu, Min Lin

Abstract: The emerging success of large language models (LLMs) heavily relies on collecting abundant training data from external (untrusted) sources. Despite substantial efforts devoted to data cleaning and curation, well-constructed LLMs have been reported to suffer from copyright infringement, data poisoning, and/or privacy violations, which would impede practical deployment of LLMs. In this study, we pro… ▽ More The emerging success of large language models (LLMs) heavily relies on collecting abundant training data from external (untrusted) sources. Despite substantial efforts devoted to data cleaning and curation, well-constructed LLMs have been reported to suffer from copyright infringement, data poisoning, and/or privacy violations, which would impede practical deployment of LLMs. In this study, we propose a simple and easily implementable method for purifying LLMs from the negative effects caused by uncurated data, namely, through ensembling LLMs with benign and small language models (SLMs). Aside from theoretical guarantees, we perform comprehensive experiments to empirically confirm the efficacy of ensembling LLMs with SLMs, which can effectively preserve the performance of LLMs while mitigating issues such as copyright infringement, data poisoning, and privacy violations. △ Less

Submitted 19 February, 2024; originally announced February 2024.

ACM Class: I.2

arXiv:2402.13583 [pdf, other]

LongWanjuan: Towards Systematic Measurement for Long Text Quality

Authors: Kai Lv, Xiaoran Liu, Qipeng Guo, Hang Yan, Conghui He, Xipeng Qiu, Dahua Lin

Abstract: The quality of training data are crucial for enhancing the long-text capabilities of foundation models. Despite existing efforts to refine data quality through heuristic rules and evaluations based on data diversity and difficulty, there's a lack of systematic approaches specifically tailored for assessing long texts. Addressing this gap, our work systematically measures the quality of long texts… ▽ More The quality of training data are crucial for enhancing the long-text capabilities of foundation models. Despite existing efforts to refine data quality through heuristic rules and evaluations based on data diversity and difficulty, there's a lack of systematic approaches specifically tailored for assessing long texts. Addressing this gap, our work systematically measures the quality of long texts by evaluating three fundamental linguistic dimensions: coherence, cohesion, and complexity. Drawing inspiration from the aforementioned three dimensions, we introduce a suite of metrics designed to evaluate the quality of long texts, encompassing both statistical and pre-trained language model-based ones. Leveraging these metrics, we present LongWanjuan, a bilingual dataset specifically tailored to enhance the training of language models for long-text tasks with over 160B tokens. In LongWanjuan, we categorize long texts into holistic, aggregated, and chaotic types, enabling a detailed analysis of long-text quality. Furthermore, we devise a data mixture recipe that strategically balances different types of long texts within LongWanjuan, leading to significant improvements in model performance on long-text tasks. The code and dataset are available at https://github.com/OpenLMLab/LongWanjuan. △ Less

Submitted 21 February, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

Comments: Update Figures

arXiv:2402.13435 [pdf, other]

Learning to Retrieve for Job Matching

Authors: Jianqiang Shen, Yuchin Juan, Shaobo Zhang, ** Liu, Wen Pu, Sriram Vasudevan, Qingquan Song, Fedor Borisyuk, Kay Qianqi Shen, Haichao Wei, Yunxiang Ren, Yeou S. Chiou, Sicong Kuang, Yuan Yin, Ben Zheng, Muchen Wu, Shaghayegh Gharghabi, Xiaoqing Wang, Huichao Xue, Qi Guo, Daniel Hewlett, Luke Simon, Liangjie Hong, Wen**g Zhang

Abstract: Web-scale search systems typically tackle the scalability challenge with a two-step paradigm: retrieval and ranking. The retrieval step, also known as candidate selection, often involves extracting standardized entities, creating an inverted index, and performing term matching for retrieval. Such traditional methods require manual and time-consuming development of query models. In this paper, we d… ▽ More Web-scale search systems typically tackle the scalability challenge with a two-step paradigm: retrieval and ranking. The retrieval step, also known as candidate selection, often involves extracting standardized entities, creating an inverted index, and performing term matching for retrieval. Such traditional methods require manual and time-consuming development of query models. In this paper, we discuss applying learning-to-retrieve technology to enhance LinkedIns job search and recommendation systems. In the realm of promoted jobs, the key objective is to improve the quality of applicants, thereby delivering value to recruiter customers. To achieve this, we leverage confirmed hire data to construct a graph that evaluates a seeker's qualification for a job, and utilize learned links for retrieval. Our learned model is easy to explain, debug, and adjust. On the other hand, the focus for organic jobs is to optimize seeker engagement. We accomplished this by training embeddings for personalized retrieval, fortified by a set of rules derived from the categorization of member feedback. In addition to a solution based on a conventional inverted index, we developed an on-GPU solution capable of supporting both KNN and term matching efficiently. △ Less

Submitted 20 February, 2024; originally announced February 2024.

arXiv:2402.13055 [pdf, other]

Identifying Semantic Induction Heads to Understand In-Context Learning

Authors: Jie Ren, Qipeng Guo, Hang Yan, Dongrui Liu, Xipeng Qiu, Dahua Lin

Abstract: Although large language models (LLMs) have demonstrated remarkable performance, the lack of transparency in their inference logic raises concerns about their trustworthiness. To gain a better understanding of LLMs, we conduct a detailed analysis of the operations of attention heads and aim to better understand the in-context learning of LLMs. Specifically, we investigate whether attention heads en… ▽ More Although large language models (LLMs) have demonstrated remarkable performance, the lack of transparency in their inference logic raises concerns about their trustworthiness. To gain a better understanding of LLMs, we conduct a detailed analysis of the operations of attention heads and aim to better understand the in-context learning of LLMs. Specifically, we investigate whether attention heads encode two types of relationships between tokens present in natural languages: the syntactic dependency parsed from sentences and the relation within knowledge graphs. We find that certain attention heads exhibit a pattern where, when attending to head tokens, they recall tail tokens and increase the output logits of those tail tokens. More crucially, the formulation of such semantic induction heads has a close correlation with the emergence of the in-context learning ability of language models. The study of semantic attention heads advances our understanding of the intricate operations of attention heads in transformers, and further provides new insights into the in-context learning of LLMs. △ Less

Submitted 20 February, 2024; originally announced February 2024.

arXiv:2402.13013 [pdf, other]

Code Needs Comments: Enhancing Code LLMs with Comment Augmentation

Authors: Demin Song, Honglin Guo, Yunhua Zhou, Shuhao Xing, Yudong Wang, Zifan Song, Wenwei Zhang, Qipeng Guo, Hang Yan, Xipeng Qiu, Dahua Lin

Abstract: The programming skill is one crucial ability for Large Language Models (LLMs), necessitating a deep understanding of programming languages (PLs) and their correlation with natural languages (NLs). We examine the impact of pre-training data on code-focused LLMs' performance by assessing the comment density as a measure of PL-NL alignment. Given the scarcity of code-comment aligned data in pre-train… ▽ More The programming skill is one crucial ability for Large Language Models (LLMs), necessitating a deep understanding of programming languages (PLs) and their correlation with natural languages (NLs). We examine the impact of pre-training data on code-focused LLMs' performance by assessing the comment density as a measure of PL-NL alignment. Given the scarcity of code-comment aligned data in pre-training corpora, we introduce a novel data augmentation method that generates comments for existing code, coupled with a data filtering strategy that filters out code data poorly correlated with natural language. We conducted experiments on three code-focused LLMs and observed consistent improvements in performance on two widely-used programming skill benchmarks. Notably, the model trained on the augmented data outperformed both the model used for generating comments and the model further trained on the data without augmentation. △ Less

Submitted 20 February, 2024; originally announced February 2024.

arXiv:2402.12760 [pdf, other]

A User-Friendly Framework for Generating Model-Preferred Prompts in Text-to-Image Synthesis

Authors: Nailei Hei, Qianyu Guo, Zihao Wang, Yan Wang, Haofen Wang, Wenqiang Zhang

Abstract: Well-designed prompts have demonstrated the potential to guide text-to-image models in generating amazing images. Although existing prompt engineering methods can provide high-level guidance, it is challenging for novice users to achieve the desired results by manually entering prompts due to a discrepancy between novice-user-input prompts and the model-preferred prompts. To bridge the distributio… ▽ More Well-designed prompts have demonstrated the potential to guide text-to-image models in generating amazing images. Although existing prompt engineering methods can provide high-level guidance, it is challenging for novice users to achieve the desired results by manually entering prompts due to a discrepancy between novice-user-input prompts and the model-preferred prompts. To bridge the distribution gap between user input behavior and model training datasets, we first construct a novel Coarse-Fine Granularity Prompts dataset (CFP) and propose a novel User-Friendly Fine-Grained Text Generation framework (UF-FGTG) for automated prompt optimization. For CFP, we construct a novel dataset for text-to-image tasks that combines coarse and fine-grained prompts to facilitate the development of automated prompt generation methods. For UF-FGTG, we propose a novel framework that automatically translates user-input prompts into model-preferred prompts. Specifically, we propose a prompt refiner that continually rewrites prompts to empower users to select results that align with their unique needs. Meanwhile, we integrate image-related loss functions from the text-to-image model into the training process of text generation to generate model-preferred prompts. Additionally, we propose an adaptive feature extraction module to ensure diversity in the generated results. Experiments demonstrate that our approach is capable of generating more visually appealing and diverse images than previous state-of-the-art methods, achieving an average improvement of 5% across six quality and aesthetic metrics. △ Less

Submitted 20 February, 2024; originally announced February 2024.

Comments: Accepted by The 38th Annual AAAI Conference on Artificial Intelligence (AAAI 2024)

arXiv:2402.12399 [pdf, other]

Turn Waste into Worth: Rectifying Top-$k$ Router of MoE

Authors: Zhiyuan Zeng, Qipeng Guo, Zhaoye Fei, Zhangyue Yin, Yunhua Zhou, Linyang Li, Tianxiang Sun, Hang Yan, Dahua Lin, Xipeng Qiu

Abstract: Sparse Mixture of Experts (MoE) models are popular for training large language models due to their computational efficiency. However, the commonly used top-$k$ routing mechanism suffers from redundancy computation and memory costs due to the unbalanced routing. Some experts are overflow, where the exceeding tokens are dropped. While some experts are vacant, which are padded with zeros, negatively… ▽ More Sparse Mixture of Experts (MoE) models are popular for training large language models due to their computational efficiency. However, the commonly used top-$k$ routing mechanism suffers from redundancy computation and memory costs due to the unbalanced routing. Some experts are overflow, where the exceeding tokens are dropped. While some experts are vacant, which are padded with zeros, negatively impacting model performance. To address the dropped tokens and padding, we propose the Rectify-Router, comprising the Intra-GPU Rectification and the Fill-in Rectification. The Intra-GPU Rectification handles dropped tokens, efficiently routing them to experts within the GPU where they are located to avoid inter-GPU communication. The Fill-in Rectification addresses padding by replacing padding tokens with the tokens that have high routing scores. Our experimental results demonstrate that the Intra-GPU Rectification and the Fill-in Rectification effectively handle dropped tokens and padding, respectively. Furthermore, the combination of them achieves superior performance, surpassing the accuracy of the vanilla top-1 router by 4.7%. △ Less

Submitted 21 February, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

arXiv:2402.12150 [pdf, other]

Your Large Language Model is Secretly a Fairness Proponent and You Should Prompt it Like One

Authors: Tianlin Li, Xiaoyu Zhang, Chao Du, Tianyu Pang, Qian Liu, Qing Guo, Chao Shen, Yang Liu

Abstract: The widespread adoption of large language models (LLMs) underscores the urgent need to ensure their fairness. However, LLMs frequently present dominant viewpoints while ignoring alternative perspectives from minority parties, resulting in potential biases. We hypothesize that these fairness-violating behaviors occur because LLMs express their viewpoints using a human personality that represents th… ▽ More The widespread adoption of large language models (LLMs) underscores the urgent need to ensure their fairness. However, LLMs frequently present dominant viewpoints while ignoring alternative perspectives from minority parties, resulting in potential biases. We hypothesize that these fairness-violating behaviors occur because LLMs express their viewpoints using a human personality that represents the majority of training data. In response to this, we validate that prompting LLMs with specific roles can allow LLMs to express diverse viewpoints. Building on this insight and observation, we develop FairThinking, a pipeline designed to automatically generate roles that enable LLMs to articulate diverse perspectives for fair expressions. To evaluate FairThinking, we create a dataset with a thousand items covering three fairness-related topics and conduct experiments on GPT-3.5, GPT-4, Llama2, and Mistral to demonstrate its superior performance. △ Less

Submitted 19 February, 2024; originally announced February 2024.

ACM Class: I.2; J.4

arXiv:2402.11678 [pdf, other]

The boundary of cosmic filaments

Authors: Wei Wang, Peng Wang, Hong Guo, Xi Kang, Noam I. Libeskind, Daniela Galarraga-Espinosa, Volker Springel, Rahul Kannan, Lars Hernquist, Rudiger Pakmor, Haoran Yu, Sownak Bose, Quan Guo, Luo Yu, Cesar Hernandez-Aguayo

Abstract: For decades, the boundary of cosmic filaments have been a subject of debate. In this work, we determine the physically-motivated radii of filaments by constructing stacked galaxy number density profiles around the filament spines. We find that the slope of the profile changes with distance to the filament spine, reaching its minimum at approximately 1 Mpc at z = 0 in both state-of-the-art hydrodyn… ▽ More For decades, the boundary of cosmic filaments have been a subject of debate. In this work, we determine the physically-motivated radii of filaments by constructing stacked galaxy number density profiles around the filament spines. We find that the slope of the profile changes with distance to the filament spine, reaching its minimum at approximately 1 Mpc at z = 0 in both state-of-the-art hydrodynamical simulations and observational data. This can be taken as the average value of the filament radius. Furthermore, we note that the average filament radius rapidly decreases from z = 4 to z = 1, and then slightly increases. Moreover, we find that the filament radius depends on the filament length, the distance from connected clusters, and the masses of the clusters. These results suggest a two-phase formation scenario of cosmic filaments. The filaments experience rapid contraction before z = 1, but their density distribution has remained roughly stable since then. The subsequent mass transport along the filaments to the connected clusters is likely to have contributed to the formation of the clusters themselves. △ Less

Submitted 18 February, 2024; originally announced February 2024.

Comments: 11 pages, 6 figures, submitted

arXiv:2402.11207 [pdf, ps, other]

Search for the production of deuterons and antideuterons in e^+e^- annihilation at center-of-mass energies between 4.13 and 4.70 GeV

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann , et al. (593 additional authors not shown)

Abstract: Using a data sample of $e^+e^-$ collision data corresponding to an integrated luminosity of 19 fb$^{-1}$ collected with the BESIII detector at the BEPCII collider, we search for the production of deuterons and antideuterons via $e^+e^-\to ppπ^-\bar{d}+c.c.$ for the first time at center-of-mass energies between 4.13 and 4.70 GeV. No significant signal is observed and the upper limit of the… ▽ More Using a data sample of $e^+e^-$ collision data corresponding to an integrated luminosity of 19 fb$^{-1}$ collected with the BESIII detector at the BEPCII collider, we search for the production of deuterons and antideuterons via $e^+e^-\to ppπ^-\bar{d}+c.c.$ for the first time at center-of-mass energies between 4.13 and 4.70 GeV. No significant signal is observed and the upper limit of the $e^+e^-\to ppπ^-\bar{d}+c.c.$ cross section is determined to be from 9.0 to 145 fb depending on the center-of-mass energy at the $90\%$ confidence level. △ Less

Submitted 17 February, 2024; originally announced February 2024.

arXiv:2402.10594 [pdf, other]

Strong decays of the $Λ_{c}(2910)$ and $Λ_{c}(2940)$ in the $ND^{*}$ molecular frame

Authors: Zi-Li Yue, Quan-Yun Guo, Dian-Yong Chen

Abstract: Stimulated by the observation of a new structure, named $Λ_{c}(2910)$, in the $Σ_{c}(2455)^{0,++}π^{+,-}$ decay channel from B meson decay process by the Belle Collaboration, and the similarity of the $Λ_{c}(2910)/Λ_{c}(2940)$ and $P_{c}$ states, we investigate the decay behavior of the $Λ_{c}(2910)$ and $Λ_{c}(2940)$ in the $N D^{*}$ molecular frame with the possible $J^{P}$ quantum numbers to be… ▽ More Stimulated by the observation of a new structure, named $Λ_{c}(2910)$, in the $Σ_{c}(2455)^{0,++}π^{+,-}$ decay channel from B meson decay process by the Belle Collaboration, and the similarity of the $Λ_{c}(2910)/Λ_{c}(2940)$ and $P_{c}$ states, we investigate the decay behavior of the $Λ_{c}(2910)$ and $Λ_{c}(2940)$ in the $N D^{*}$ molecular frame with the possible $J^{P}$ quantum numbers to be $1/2^{-}$ and $3/2^{-}$. We employ an effective Lagrangian approach to evaluate the partial widths of $ND$, $Σ_{c}π$ and $Σ_{c}^{*}π$ channels. The estimations in the present work indicate that the $J^P$ quantum numbers of $Λ_{c}(2910)$ and $Λ_{c}(2940)$ are preferred to be $1/2^{-}$ and $3/2^{-}$, respectively. From the present estimations, we also find the branching ratio of $Λ_c(3/2) \to Σ_c^\ast π$ is much larger than that of $Λ_c(1/2) \to Σ_c^\ast π$, thus $Σ_{c}^{*}π$ could be a good channel to distinguish the $J^{P}$ quantum numbers of $Λ_{c}(2910)$ and $Λ_{c}(2940)$. Therefore, we suggest to searching for the structure in $Σ_{c}^{*}π$ invariant mass distribution in Belle $\amalg$. △ Less

Submitted 26 May, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

Comments: 12 pages, 13 figures, 3 tables, accepted for publication in PRD

arXiv:2402.10593 [pdf, other]

Bayesian Learning for Double-RIS Aided ISAC Systems with Superimposed Pilots and Data

Authors: Xu Gan, Chongwen Huang, Zhaohui Yang, Caijun Zhong, Xiaoming Chen, Zhaoyang Zhang, Qinghua Guo, Chau Yuen, Merouane Debbah

Abstract: Reconfigurable intelligent surface (RIS) has great potential to improve the performance of integrated sensing and communication (ISAC) systems, especially in scenarios where line-of-sight paths between the base station and users are blocked. However, the spectral efficiency (SE) of RIS-aided ISAC uplink transmissions may be drastically reduced by the heavy burden of pilot overhead for realizing se… ▽ More Reconfigurable intelligent surface (RIS) has great potential to improve the performance of integrated sensing and communication (ISAC) systems, especially in scenarios where line-of-sight paths between the base station and users are blocked. However, the spectral efficiency (SE) of RIS-aided ISAC uplink transmissions may be drastically reduced by the heavy burden of pilot overhead for realizing sensing capabilities. In this paper, we tackle this bottleneck by proposing a superimposed symbol scheme, which superimposes sensing pilots onto data symbols over the same time-frequency resources. Specifically, we develop a structure-aware sparse Bayesian learning framework, where decoded data symbols serve as side information to enhance sensing performance and increase SE. To meet the low-latency requirements of emerging ISAC applications, we further propose a low-complexity simultaneous communication and localization algorithm for multiple users. This algorithm employs the unitary approximate message passing in the Bayesian learning framework for initial angle estimate, followed by iterative refinements through reduced-dimension matrix calculations. Moreover, the sparse code multiple access technology is incorporated into this iterative framework for accurate data detection which also facilitates localization. Numerical results show that the proposed superimposed symbol-based scheme empowered by the developed algorithm can achieve centimeter-level localization while attaining up to $96\%$ of the SE of conventional communications without sensing capabilities. Moreover, compared to other typical ISAC schemes, the proposed superimposed symbol scheme can provide an effective throughput improvement over $133\%$. △ Less

Submitted 16 February, 2024; originally announced February 2024.

arXiv:2402.09263 [pdf]

Uncertainty-Aware Transient Stability-Constrained Preventive Redispatch: A Distributional Reinforcement Learning Approach

Authors: Zhengcheng Wang, Fei Teng, Yanzhen Zhou, Qinglai Guo, Hongbin Sun

Abstract: Transient stability-constrained preventive redispatch plays a crucial role in ensuring power system security and stability. Since redispatch strategies need to simultaneously satisfy complex transient constraints and the economic need, model-based formulation and optimization become extremely challenging. In addition, the increasing uncertainty and variability introduced by renewable sources start… ▽ More Transient stability-constrained preventive redispatch plays a crucial role in ensuring power system security and stability. Since redispatch strategies need to simultaneously satisfy complex transient constraints and the economic need, model-based formulation and optimization become extremely challenging. In addition, the increasing uncertainty and variability introduced by renewable sources start to drive the system stability consideration from deterministic to probabilistic, which further exaggerates the complexity. In this paper, a Graph neural network guided Distributional Deep Reinforcement Learning (GD2RL) method is proposed, for the first time, to solve the uncertainty-aware transient stability-constrained preventive redispatch problem. First, a graph neural network-based transient simulator is trained by supervised learning to efficiently generate post-contingency rotor angle curves with the steady-state and contingency as inputs, which serves as a feature extractor for operating states and a surrogate time-domain simulator during the environment interaction for reinforcement learning. Distributional deep reinforcement learning with explicit uncertainty distribution of system operational conditions is then applied to generate the redispatch strategy to balance the user-specified probabilistic stability performance and economy preferences. The full distribution of the post-redispatch transient stability index is directly provided as the output. Case studies on the modified New England 39-bus system validate the proposed method. △ Less

Submitted 29 June, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

Comments: 13 pages,11 figures,accepted by IEEE Transactions on Power Systems on 24-Jun-2024

arXiv:2402.06341 [pdf, other]

RareBench: Can LLMs Serve as Rare Diseases Specialists?

Authors: Xuanzhong Chen, Xiaohao Mao, Qihan Guo, Lun Wang, Shuyang Zhang, Ting Chen

Abstract: Generalist Large Language Models (LLMs), such as GPT-4, have shown considerable promise in various domains, including medical diagnosis. Rare diseases, affecting approximately 300 million people worldwide, often have unsatisfactory clinical diagnosis rates primarily due to a lack of experienced physicians and the complexity of differentiating among many rare diseases. In this context, recent news… ▽ More Generalist Large Language Models (LLMs), such as GPT-4, have shown considerable promise in various domains, including medical diagnosis. Rare diseases, affecting approximately 300 million people worldwide, often have unsatisfactory clinical diagnosis rates primarily due to a lack of experienced physicians and the complexity of differentiating among many rare diseases. In this context, recent news such as "ChatGPT correctly diagnosed a 4-year-old's rare disease after 17 doctors failed" underscore LLMs' potential, yet underexplored, role in clinically diagnosing rare diseases. To bridge this research gap, we introduce RareBench, a pioneering benchmark designed to systematically evaluate the capabilities of LLMs on 4 critical dimensions within the realm of rare diseases. Meanwhile, we have compiled the largest open-source dataset on rare disease patients, establishing a benchmark for future studies in this domain. To facilitate differential diagnosis of rare diseases, we develop a dynamic few-shot prompt methodology, leveraging a comprehensive rare disease knowledge graph synthesized from multiple knowledge bases, significantly enhancing LLMs' diagnostic performance. Moreover, we present an exhaustive comparative study of GPT-4's diagnostic capabilities against those of specialist physicians. Our experimental findings underscore the promising potential of integrating LLMs into the clinical diagnostic process for rare diseases. This paves the way for exciting possibilities in future advancements in this field. △ Less

Submitted 9 February, 2024; originally announced February 2024.

arXiv:2402.06329 [pdf]

A Network for structural dense displacement based on 3D deformable mesh model and optical flow

Authors: Peimian Du, Qicheng Guo, Yanru Li

Abstract: This study proposes a Network to recognize displacement of a RC frame structure from a video by a monocular camera. The proposed Network consists of two modules which is FlowNet2 and POFRN-Net. FlowNet2 is used to generate dense optical flow as well as POFRN-Net is to extract pose parameter H. FlowNet2 convert two video frames into dense optical flow. POFRN-Net is inputted dense optical flow from… ▽ More This study proposes a Network to recognize displacement of a RC frame structure from a video by a monocular camera. The proposed Network consists of two modules which is FlowNet2 and POFRN-Net. FlowNet2 is used to generate dense optical flow as well as POFRN-Net is to extract pose parameter H. FlowNet2 convert two video frames into dense optical flow. POFRN-Net is inputted dense optical flow from FlowNet2 to output the pose parameter H. The displacement of any points of structure can be calculated from parameter H. The Fast Fourier Transform (FFT) is applied to obtain frequency domain signal from corresponding displacement signal. Furthermore, the comparison of the truth displacement on the First floor of the First video is shown in this study. Finally, the predicted displacements on four floors of RC frame structure of given three videos are exhibited in the last of this study. △ Less

Submitted 9 February, 2024; originally announced February 2024.

Comments: Paper for the 3rd International Competition for Structural Health Monitoring (IC-SHM 2022): 15 pages, 13 figures

arXiv:2402.05492 [pdf, other]

Cosmological Forecast of the Void Size Function Measurement from the CSST Spectroscopic Survey

Authors: Yingxiao Song, Qi Xiong, Yan Gong, Furen Deng, Kwan Chuen Chan, Xuelei Chen, Qi Guo, Jiaxin Han, Guoliang Li, Ming Li, Yun Liu, Yu Luo, Wenxiang Pei, Chengliang Wei

Abstract: Void size function (VSF) contains information of the cosmic large-scale structure (LSS), and can be used to derive the properties of dark energy and dark matter. We predict the VSFs measured from the spectroscopic galaxy survey operated by the China Space Station Telescope (CSST), and study the strength of cosmological constraint. We employ a high-resolution Jiutian simulation to get CSST galaxy m… ▽ More Void size function (VSF) contains information of the cosmic large-scale structure (LSS), and can be used to derive the properties of dark energy and dark matter. We predict the VSFs measured from the spectroscopic galaxy survey operated by the China Space Station Telescope (CSST), and study the strength of cosmological constraint. We employ a high-resolution Jiutian simulation to get CSST galaxy mock samples based on an improved semi-analytical model. We identify voids from this galaxy catalog using the watershed algorithm without assuming a spherical shape, and estimate the VSFs at different redshift bins from $z=0.5$ to 1.1. We propose a void selection method based on the ellipticity, and assume the void linear underdensity threshold $δ_{\rm v}$ in the theoretical model is redshift-dependent and set it as a free parameter in each redshift bin. The Markov Chain Monte Carlo (MCMC) method is adopted to implement the constraints on the cosmological and void parameters. We find that the CSST VSF measurement can constrain the cosmological parameters to a few percent level. The best-fit values of $δ_{\rm v}$ are ranging from $\sim-0.4$ to $-0.1$ as the redshift increases from 0.5 to 1.1, which has a distinct difference from the theoretical calculation with $δ_{\rm v}\simeq-2.7$ assuming the spherical evolution and using particles as tracer. Our method can provide a good reference for void identification and selection in the VSF analysis of the spectroscopic galaxy surveys. △ Less

Submitted 24 June, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

Comments: 10 pages, 7 figures, 3 tables. Accepted for publication in MNRAS

arXiv:2402.04546 [pdf, other]

LiDAR-Forest Dataset: LiDAR Point Cloud Simulation Dataset for Forestry Application

Authors: Yawen Lu, Zhuoyang Sun, **yuan Shao, Qianyu Guo, Yunhan Huang, Songlin Fei, Yingjie Chen

Abstract: The popularity of LiDAR devices and sensor technology has gradually empowered users from autonomous driving to forest monitoring, and research on 3D LiDAR has made remarkable progress over the years. Unlike 2D images, whose focused area is visible and rich in texture information, understanding the point distribution can help companies and researchers find better ways to develop point-based 3D appl… ▽ More The popularity of LiDAR devices and sensor technology has gradually empowered users from autonomous driving to forest monitoring, and research on 3D LiDAR has made remarkable progress over the years. Unlike 2D images, whose focused area is visible and rich in texture information, understanding the point distribution can help companies and researchers find better ways to develop point-based 3D applications. In this work, we contribute an unreal-based LiDAR simulation tool and a 3D simulation dataset named LiDAR-Forest, which can be used by various studies to evaluate forest reconstruction, tree DBH estimation, and point cloud compression for easy visualization. The simulation is customizable in tree species, LiDAR types and scene generation, with low cost and high efficiency. △ Less

Submitted 15 February, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

Comments: 5 pages

arXiv:2402.03829 [pdf, ps, other]

Precise Measurement of Born Cross Sections for $e^+e^-\to D\bar{D}$ and Observation of One Structure between $\sqrt{s} = 3.80-4.95$ GeV

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann , et al. (604 additional authors not shown)

Abstract: Using data samples collected with the BESIII detector at the BEPCII collider at center-of-mass energies ranging from 3.80 to 4.95 GeV, corresponding to an integrated luminosity of 20 fb$^{-1}$, a measurement of Born cross sections for the $e^+e^-\to D^{0}\bar{D}^{0}$ and $D^{+}D^{-}$ processes is presented with unprecedented precision. By performing a simultaneous fit to the dressed cross sections… ▽ More Using data samples collected with the BESIII detector at the BEPCII collider at center-of-mass energies ranging from 3.80 to 4.95 GeV, corresponding to an integrated luminosity of 20 fb$^{-1}$, a measurement of Born cross sections for the $e^+e^-\to D^{0}\bar{D}^{0}$ and $D^{+}D^{-}$ processes is presented with unprecedented precision. By performing a simultaneous fit to the dressed cross sections for both processes, one possible new structure around 3.9 GeV/$c^2$ is observed for the first time, in addition to seven known resonances $ψ(3770)$, $ψ(4040)$, $ψ(4160)$, $Y(4230)$, $Y(4360)$, $ψ(4415)$, and $Y(4660)$. These results offer crucial experimental insights into the nature of hadron production in the open charm region. △ Less

Submitted 6 February, 2024; originally announced February 2024.

Comments: 9 pages, 4 figures, 1 tables, 1 Supplemental_Material

arXiv:2402.03705 [pdf, other]

FoolSDEdit: Deceptively Steering Your Edits Towards Targeted Attribute-aware Distribution

Authors: Qi Zhou, Dongxia Wang, Tianlin Li, Zhihong Xu, Yang Liu, Kui Ren, Wenhai Wang, Qing Guo

Abstract: Guided image synthesis methods, like SDEdit based on the diffusion model, excel at creating realistic images from user inputs such as stroke paintings. However, existing efforts mainly focus on image quality, often overlooking a key point: the diffusion model represents a data distribution, not individual images. This introduces a low but critical chance of generating images that contradict user i… ▽ More Guided image synthesis methods, like SDEdit based on the diffusion model, excel at creating realistic images from user inputs such as stroke paintings. However, existing efforts mainly focus on image quality, often overlooking a key point: the diffusion model represents a data distribution, not individual images. This introduces a low but critical chance of generating images that contradict user intentions, raising ethical concerns. For example, a user inputting a stroke painting with female characteristics might, with some probability, get male faces from SDEdit. To expose this potential vulnerability, we aim to build an adversarial attack forcing SDEdit to generate a specific data distribution aligned with a specified attribute (e.g., female), without changing the input's attribute characteristics. We propose the Targeted Attribute Generative Attack (TAGA), using an attribute-aware objective function and optimizing the adversarial noise added to the input stroke painting. Empirical studies reveal that traditional adversarial noise struggles with TAGA, while natural perturbations like exposure and motion blur easily alter generated images' attributes. To execute effective attacks, we introduce FoolSDEdit: We design a joint adversarial exposure and blur attack, adding exposure and motion blur to the stroke painting and optimizing them together. We optimize the execution strategy of various perturbations, framing it as a network architecture search problem. We create the SuperPert, a graph representing diverse execution strategies for different perturbations. After training, we obtain the optimized execution strategy for effective TAGA against SDEdit. Comprehensive experiments on two datasets show our method compelling SDEdit to generate a targeted attribute-aware data distribution, significantly outperforming baselines. △ Less

Submitted 5 February, 2024; originally announced February 2024.

arXiv:2402.03661 [pdf, other]

Transductive Reward Inference on Graph

Authors: Bohao Qu, Xiaofeng Cao, Qing Guo, Yi Chang, Ivor W. Tsang, Chengqi Zhang

Abstract: In this study, we present a transductive inference approach on that reward information propagation graph, which enables the effective estimation of rewards for unlabelled data in offline reinforcement learning. Reward inference is the key to learning effective policies in practical scenarios, while direct environmental interactions are either too costly or unethical and the reward functions are ra… ▽ More In this study, we present a transductive inference approach on that reward information propagation graph, which enables the effective estimation of rewards for unlabelled data in offline reinforcement learning. Reward inference is the key to learning effective policies in practical scenarios, while direct environmental interactions are either too costly or unethical and the reward functions are rarely accessible, such as in healthcare and robotics. Our research focuses on develo** a reward inference method based on the contextual properties of information propagation on graphs that capitalizes on a constrained number of human reward annotations to infer rewards for unlabelled data. We leverage both the available data and limited reward annotations to construct a reward propagation graph, wherein the edge weights incorporate various influential factors pertaining to the rewards. Subsequently, we employ the constructed graph for transductive reward inference, thereby estimating rewards for unlabelled data. Furthermore, we establish the existence of a fixed point during several iterations of the transductive inference process and demonstrate its at least convergence to a local optimum. Empirical evaluations on locomotion and robotic manipulation tasks validate the effectiveness of our approach. The application of our inferred rewards improves the performance in offline reinforcement learning tasks. △ Less

Submitted 5 February, 2024; originally announced February 2024.

arXiv:2402.02738 [pdf, other]

Improving Robustness of LiDAR-Camera Fusion Model against Weather Corruption from Fusion Strategy Perspective

Authors: Yihao Huang, Kaiyuan Yu, Qing Guo, Felix Juefei-Xu, Xiaojun Jia, Tianlin Li, Geguang Pu, Yang Liu

Abstract: In recent years, LiDAR-camera fusion models have markedly advanced 3D object detection tasks in autonomous driving. However, their robustness against common weather corruption such as fog, rain, snow, and sunlight in the intricate physical world remains underexplored. In this paper, we evaluate the robustness of fusion models from the perspective of fusion strategies on the corrupted dataset. Base… ▽ More In recent years, LiDAR-camera fusion models have markedly advanced 3D object detection tasks in autonomous driving. However, their robustness against common weather corruption such as fog, rain, snow, and sunlight in the intricate physical world remains underexplored. In this paper, we evaluate the robustness of fusion models from the perspective of fusion strategies on the corrupted dataset. Based on the evaluation, we further propose a concise yet practical fusion strategy to enhance the robustness of the fusion models, namely flexibly weighted fusing features from LiDAR and camera sources to adapt to varying weather scenarios. Experiments conducted on four types of fusion models, each with two distinct lightweight implementations, confirm the broad applicability and effectiveness of the approach. △ Less

Submitted 5 February, 2024; originally announced February 2024.

Comments: 17 pages

Showing 101–150 of 1,510 results for author: Guo, Q