Search | arXiv e-print repository

RelationVLM: Making Large Vision-Language Models Understand Visual Relations

Authors: Zhipeng Huang, Zhizheng Zhang, Zheng-Jun Zha, Yan Lu, Baining Guo

Abstract: The development of Large Vision-Language Models (LVLMs) is striving to catch up with the success of Large Language Models (LLMs), yet it faces more challenges to be resolved. Very recent works enable LVLMs to localize object-level visual contents and ground text to them. Nonetheless, current LVLMs still struggle to precisely understand visual relations due to the lack of relevant data. In this wor… ▽ More The development of Large Vision-Language Models (LVLMs) is striving to catch up with the success of Large Language Models (LLMs), yet it faces more challenges to be resolved. Very recent works enable LVLMs to localize object-level visual contents and ground text to them. Nonetheless, current LVLMs still struggle to precisely understand visual relations due to the lack of relevant data. In this work, we present RelationVLM, a large vision-language model capable of comprehending various levels and types of relations whether across multiple images or within a video. Specifically, we devise a multi-stage relation-aware training scheme and a series of corresponding data configuration strategies to bestow RelationVLM with the capabilities of understanding semantic relations, temporal associations and geometric transforms. Extensive case studies and quantitative evaluations show RelationVLM has strong capability in understanding such relations and emerges impressive in-context capability of reasoning from few-shot examples by comparison. This work fosters the advancements of LVLMs by enabling them to support a wider range of downstream applications toward artificial general intelligence. △ Less

Submitted 19 March, 2024; originally announced March 2024.

arXiv:2403.11238 [pdf, other]

JUMBO: Fully Asynchronous BFT Consensus Made Truly Scalable

Authors: Hao Cheng, Yuan Lu, Zhenliang Lu, Qiang Tang, Yuxuan Zhang, Zhenfeng Zhang

Abstract: Recent progresses in asynchronous Byzantine fault-tolerant (BFT) consensus, e.g. Dumbo-NG (CCS' 22) and Tusk (EuroSys' 22), show promising performance through decoupling transaction dissemination and block agreement. However, when executed with a larger number $n$ of nodes, like several hundreds, they would suffer from significant degradation in performance. Their dominating scalability bottleneck… ▽ More Recent progresses in asynchronous Byzantine fault-tolerant (BFT) consensus, e.g. Dumbo-NG (CCS' 22) and Tusk (EuroSys' 22), show promising performance through decoupling transaction dissemination and block agreement. However, when executed with a larger number $n$ of nodes, like several hundreds, they would suffer from significant degradation in performance. Their dominating scalability bottleneck is the huge authenticator complexity: each node has to multicast $\bigO(n)$ quorum certificates (QCs) and subsequently verify them for each block. This paper systematically investigates and resolves the above scalability issue. We first propose a signature-free asynchronous BFT consensus FIN-NG that adapts a recent signature-free asynchronous common subset protocol FIN (CCS' 23) into the state-of-the-art framework of concurrent broadcast and agreement. The liveness of FIN-NG relies on our non-trivial redesign of FIN's multi-valued validated Byzantine agreement towards achieving optimal quality. FIN-NG greatly improves the performance of FIN and already outperforms Dumbo-NG in most deployment settings. To further overcome the scalability limit of FIN-NG due to $\bigO(n^3)$ messages, we propose JUMBO, a scalable instantiation of Dumbo-NG, with only $\bigO(n^2)$ complexities for both authenticators and messages. We use various aggregation and dispersal techniques for QCs to significantly reduce the authenticator complexity of original Dumbo-NG implementations by up to $\bigO(n^2)$ orders. We also propose a ``fairness'' patch for JUMBO, thus preventing a flooding adversary from controlling an overwhelming portion of transactions in its output. △ Less

Submitted 17 March, 2024; originally announced March 2024.

arXiv:2403.11118 [pdf, other]

Impact of Starlink constellation on Early LSST: a Photometric Analysis of Satellite Trails with BRDF Model

Authors: Yao Lu

Abstract: We report a simulation and quantification of the impact of the Starlink constellation on LSST in terms of the trail surface brightness using a BRDF-based satellite photometric model. A total of 11,908 satellites from the Gen1 and Gen2A constellations are used to focus on the interference to the initial phase of LSST operation. The all-sky simulation shows that approximately 69.33% of the visible s… ▽ More We report a simulation and quantification of the impact of the Starlink constellation on LSST in terms of the trail surface brightness using a BRDF-based satellite photometric model. A total of 11,908 satellites from the Gen1 and Gen2A constellations are used to focus on the interference to the initial phase of LSST operation. The all-sky simulation shows that approximately 69.33% of the visible satellites over station have an apparent brightness greater than 7 mag with a v1.5 satellite model. The impact of satellite streaks exhibit a non-monotonic relationship to the solar altitude, with the worst moments occurring around $-15^{\circ}$ solar altitude. The assessment based on simulated schedules indicates that no trails can reach the saturation-level magnitude, but 71.61% trails show a surface brightness brighter than the best-case crosstalk correctable limits, and this percentage increases as the dodging weight increases. Therefore, avoiding satellites in the scheduler algorithm is an effective mitigation method, but both the number of streaks and their brightness should be taken into account simultaneously. △ Less

Submitted 17 March, 2024; originally announced March 2024.

Comments: 7 pages, 4 figures

arXiv:2403.10877 [pdf, ps, other]

Test of lepton universality and measurement of the form factors of $D^0\to K^{*}(892)^-μ^+ν_μ$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (637 additional authors not shown)

Abstract: We report a first study of the semileptonic decay $D^0\rightarrow K^-π^0μ^{+}ν_μ$ by analyzing an $e^+e^-$ annihilation data sample of $7.9~\mathrm{fb}^{-1}$ collected at the center-of-mass energy of 3.773 GeV with the BESIII detector. The absolute branching fraction of $D^0\to K^-π^0μ^{+}ν_μ$ is measured for the first time to be $(0.729 \pm 0.014_{\rm stat} \pm 0.011_{\rm syst})\%$. Based on an a… ▽ More We report a first study of the semileptonic decay $D^0\rightarrow K^-π^0μ^{+}ν_μ$ by analyzing an $e^+e^-$ annihilation data sample of $7.9~\mathrm{fb}^{-1}$ collected at the center-of-mass energy of 3.773 GeV with the BESIII detector. The absolute branching fraction of $D^0\to K^-π^0μ^{+}ν_μ$ is measured for the first time to be $(0.729 \pm 0.014_{\rm stat} \pm 0.011_{\rm syst})\%$. Based on an amplitude analysis, the $S\text{-}{\rm wave}$ contribution is determined to be $(5.76 \pm 0.35_{\rm stat} \pm 0.29_{\rm syst})\%$ of the total decay rate in addition to the dominated $K^{*}(892)^-$ component. The branching fraction of $D^0\to K^{*}(892)^-μ^+ν_μ$ is given to be $(2.062 \pm 0.039_{\rm stat} \pm 0.032_{\rm syst})\%$, which improves the precision of the world average by a factor of 5. Combining with the world average of ${\mathcal B}(D^0\to K^{*}(892)^-e^+ν_e)$, the ratio of the branching fractions obtained is $\frac{{\mathcal B}(D^0\to K^{*}(892)^-μ^+ν_μ)}{{\mathcal B}(D^0\to K^{*}(892)^-e^+ν_e)} = 0.96\pm0.08$, in agreement with lepton flavor universality. Furthermore, assuming single-pole dominance parameterization, the most precise hadronic form factor ratios for $D^0\to K^{*}(892)^{-} μ^+ν_μ$ are extracted to be $r_{V}=V(0)/A_1(0)=1.37 \pm 0.09_{\rm stat} \pm 0.03_{\rm syst}$ and $r_{2}=A_2(0)/A_1(0)=0.76 \pm 0.06_{\rm stat} \pm 0.02_{\rm syst}$. △ Less

Submitted 16 March, 2024; originally announced March 2024.

Comments: 9 pages, 3 figures

arXiv:2403.10754 [pdf, other]

doi 10.1093/mnras/stae762

CSST large-scale structure analysis pipeline: I. constructing reference mock galaxy redshift surveys

Authors: Yizhou Gu, Xiaohu Yang, Jiaxin Han, Yirong Wang, Qingyang Li, Zhenlin Tan, Wenkang Jiang, Yaru Wang, Jiaqi Wang, Antonios Katsianis, Xiaoju Xu, Haojie Xu, Wensheng Hong, Houjun Mo, Run Wen, Xianzhong Zheng, Feng Shi, Pengjie Zhang, Zhongxu Zhai, Chengze Liu, Wenting Wang, Ying Zu, Hong Guo, Youcai Zhang, Yi Lu , et al. (7 additional authors not shown)

Abstract: In this paper, we set out to construct a set of reference mock galaxy redshift surveys (MGRSs) for the future Chinese Space-station Survey Telescope (CSST) observation, where subsequent survey selection effects can be added and evaluated. This set of MGRSs is generated using the dark matter subhalos extracted from a high-resolution Jiutian $N$-body simulation of the standard $Λ$CDM cosmogony with… ▽ More In this paper, we set out to construct a set of reference mock galaxy redshift surveys (MGRSs) for the future Chinese Space-station Survey Telescope (CSST) observation, where subsequent survey selection effects can be added and evaluated. This set of MGRSs is generated using the dark matter subhalos extracted from a high-resolution Jiutian $N$-body simulation of the standard $Λ$CDM cosmogony with $Ω_m=0.3111$, $Ω_Λ=0.6889$, and $σ_8=0.8102$. The simulation has a boxsize of $1~h^{-1} {\rm Gpc}$, and consists of $6144^3$ particles with mass resolution $3.723 \times 10^{8} h^{-1} M_\odot $. In order to take into account the effect of redshift evolution, we first use all 128 snapshots in the Jiutian simulation to generate a light-cone halo/subhalo catalog. Next, galaxy luminosities are assigned to the main and subhalo populations using the subhalo abundance matching (SHAM) method with the DESI $z$-band luminosity functions at different redshifts. Multi-band photometries, as well as images, are then assigned to each mock galaxy using a 3-dimensional parameter space nearest neighbor sampling of the DESI LS observational galaxies and groups. Finally, the CSST and DESI LS survey geometry and magnitude limit cuts are applied to generate the required MGRSs. As we have checked, this set of MGRSs can generally reproduce the observed galaxy luminosity/mass functions within 0.1 dex for galaxies with $L > 10^8 L_\odot$ (or $M_* > 10^{8.5} M_\odot$) and within 1-$σ$ level for galaxies with $L < 10^8L_\odot$ (or $M_* < 10^{8.5} M_\odot$). Together with the CSST slitless spectra and redshifts for our DESI LS seed galaxies that are under construction, we will set out to test various slitless observational selection effects in subsequent probes. △ Less

Submitted 15 March, 2024; originally announced March 2024.

Comments: 13 pages, 9 figures, accepted for publication in MNRAS

arXiv:2403.09889 [pdf, other]

Generalization of Scaled Deep ResNets in the Mean-Field Regime

Authors: Yihang Chen, Fanghui Liu, Yi** Lu, Grigorios G. Chrysos, Volkan Cevher

Abstract: Despite the widespread empirical success of ResNet, the generalization properties of deep ResNet are rarely explored beyond the lazy training regime. In this work, we investigate \emph{scaled} ResNet in the limit of infinitely deep and wide neural networks, of which the gradient flow is described by a partial differential equation in the large-neural network limit, i.e., the \emph{mean-field} regi… ▽ More Despite the widespread empirical success of ResNet, the generalization properties of deep ResNet are rarely explored beyond the lazy training regime. In this work, we investigate \emph{scaled} ResNet in the limit of infinitely deep and wide neural networks, of which the gradient flow is described by a partial differential equation in the large-neural network limit, i.e., the \emph{mean-field} regime. To derive the generalization bounds under this setting, our analysis necessitates a shift from the conventional time-invariant Gram matrix employed in the lazy training regime to a time-variant, distribution-dependent version. To this end, we provide a global lower bound on the minimum eigenvalue of the Gram matrix under the mean-field regime. Besides, for the traceability of the dynamic of Kullback-Leibler (KL) divergence, we establish the linear convergence of the empirical error and estimate the upper bound of the KL divergence over parameters distribution. Finally, we build the uniform convergence for generalization bound via Rademacher complexity. Our results offer new insights into the generalization ability of deep ResNet beyond the lazy training regime and contribute to advancing the understanding of the fundamental properties of deep neural networks. △ Less

Submitted 14 March, 2024; originally announced March 2024.

Comments: ICLR 2024 (Spotlight)

arXiv:2403.09798 [pdf, other]

Comparing Rationality Between Large Language Models and Humans: Insights and Open Questions

Authors: Dana Alsagheer, Rabimba Karanjai, Nour Diallo, Weidong Shi, Yang Lu, Suha Beydoun, Qiaoning Zhang

Abstract: This paper delves into the dynamic landscape of artificial intelligence, specifically focusing on the burgeoning prominence of large language models (LLMs). We underscore the pivotal role of Reinforcement Learning from Human Feedback (RLHF) in augmenting LLMs' rationality and decision-making prowess. By meticulously examining the intricate relationship between human interaction and LLM behavior, w… ▽ More This paper delves into the dynamic landscape of artificial intelligence, specifically focusing on the burgeoning prominence of large language models (LLMs). We underscore the pivotal role of Reinforcement Learning from Human Feedback (RLHF) in augmenting LLMs' rationality and decision-making prowess. By meticulously examining the intricate relationship between human interaction and LLM behavior, we explore questions surrounding rationality and performance disparities between humans and LLMs, with particular attention to the Chat Generative Pre-trained Transformer. Our research employs comprehensive comparative analysis and delves into the inherent challenges of irrationality in LLMs, offering valuable insights and actionable strategies for enhancing their rationality. These findings hold significant implications for the widespread adoption of LLMs across diverse domains and applications, underscoring their potential to catalyze advancements in artificial intelligence. △ Less

Submitted 14 March, 2024; originally announced March 2024.

arXiv:2403.09750 [pdf, other]

Meta-Cognitive Analysis: Evaluating Declarative and Procedural Knowledge in Datasets and Large Language Models

Authors: Zhuoqun Li, Hongyu Lin, Yaojie Lu, Hao Xiang, Xianpei Han, Le Sun

Abstract: Declarative knowledge and procedural knowledge are two key parts in meta-cognitive theory, and these two hold significant importance in pre-training and inference of LLMs. However, a comprehensive analysis comparing these two types of knowledge is lacking, primarily due to challenges in definition, probing and quantitative assessment. In this paper, we explore from a new perspective by providing g… ▽ More Declarative knowledge and procedural knowledge are two key parts in meta-cognitive theory, and these two hold significant importance in pre-training and inference of LLMs. However, a comprehensive analysis comparing these two types of knowledge is lacking, primarily due to challenges in definition, probing and quantitative assessment. In this paper, we explore from a new perspective by providing ground-truth knowledge for LLMs and evaluating the effective score. Through extensive experiments with widely-used datasets and models, we get conclusions: (1) In most tasks, benefits from declarative knowledge are greater than those from procedural knowledge. (2) Profits of procedural knowledge are larger than declarative knowledge only in reasoning tasks with simple logic. (3) As pre-training progresses and size increases, model ability to utilize both kinds of knowledge significantly improves, but in different speed. We do detailed analysis for the findings and this can provide primary guidance for evaluation and enhancement of large language models. △ Less

Submitted 14 March, 2024; originally announced March 2024.

Comments: Accepted by LREC-COLING 2024 as a short paper

arXiv:2403.09464 [pdf, other]

doi 10.1051/0004-6361/202348460

New constraints on Triton's atmosphere from the 6 October 2022 stellar occultation

Authors: Ye Yuan, Chen Zhang, Fan Li, Jian Chen, Yanning Fu, Chunhai Bai, Xing Gao, Yong Wang, Tuhong Zhong, Yixing Gao, Liang Wang, Donghua Chen, Yixing Zhang, Yang Zhang, Wenpeng Xie, Shupi Zhang, Ding Liu, Jun Cao, Xiangdong Yin, Xiaojun Mo, **g Liu, Xinru Han, Tong Liu, Yuqiang Chen, Zhendong Gao , et al. (25 additional authors not shown)

Abstract: The atmosphere of Triton was probed directly by observing a ground-based stellar occultation on 6 October 2022. This rare event yielded 23 positive light curves collected from 13 separate observation stations contributing to our campaign. The significance of this event lies in its potential to directly validate the modest pressure fluctuation on Triton, a phenomenon not definitively verified by pr… ▽ More The atmosphere of Triton was probed directly by observing a ground-based stellar occultation on 6 October 2022. This rare event yielded 23 positive light curves collected from 13 separate observation stations contributing to our campaign. The significance of this event lies in its potential to directly validate the modest pressure fluctuation on Triton, a phenomenon not definitively verified by previous observations, including only five stellar occultations, and the Voyager 2 radio occultation in 1989. Using an approach consistent with a comparable study, we precisely determined a surface pressure of $14.07_{-0.13}^{+0.21}~\mathrm{μbar}$ in 2022. This new pressure rules out any significant monotonic variation in pressure between 2017 and 2022 through direct observations, as it is in alignment with the 2017 value. Additionally, both the pressures in 2017 and 2022 align with the 1989 value. This provides further support for the conclusion drawn from the previous volatile transport model simulation, which is consistent with the observed alignment between the pressures in 1989 and 2017; that is to say, the pressure fluctuation is modest. Moreover, this conclusion suggests the existence of a northern polar cap extended down to at least $45^\circ$N$-60^\circ$N and the presence of nitrogen between $30^\circ$S and $0^\circ$. △ Less

Submitted 24 March, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

Comments: Astronomy & Astrophysics, in press. 9 pages, 2 figures, 3 tables

Journal ref: A&A 684, L13 (2024)

arXiv:2403.08689 [pdf, other]

Exploiting Structural Consistency of Chest Anatomy for Unsupervised Anomaly Detection in Radiography Images

Authors: Tiange Xiang, Yixiao Zhang, Yongyi Lu, Alan Yuille, Chaoyi Zhang, Weidong Cai, Zongwei Zhou

Abstract: Radiography imaging protocols focus on particular body regions, therefore producing images of great similarity and yielding recurrent anatomical structures across patients. Exploiting this structured information could potentially ease the detection of anomalies from radiography images. To this end, we propose a Simple Space-Aware Memory Matrix for In-painting and Detecting anomalies from radiograp… ▽ More Radiography imaging protocols focus on particular body regions, therefore producing images of great similarity and yielding recurrent anatomical structures across patients. Exploiting this structured information could potentially ease the detection of anomalies from radiography images. To this end, we propose a Simple Space-Aware Memory Matrix for In-painting and Detecting anomalies from radiography images (abbreviated as SimSID). We formulate anomaly detection as an image reconstruction task, consisting of a space-aware memory matrix and an in-painting block in the feature space. During the training, SimSID can taxonomize the ingrained anatomical structures into recurrent visual patterns, and in the inference, it can identify anomalies (unseen/modified visual patterns) from the test image. Our SimSID surpasses the state of the arts in unsupervised anomaly detection by +8.0%, +5.0%, and +9.9% AUC scores on ZhangLab, COVIDx, and CheXpert benchmark datasets, respectively. Code: https://github.com/MrGiovanni/SimSID △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI). arXiv admin note: substantial text overlap with arXiv:2111.13495

arXiv:2403.08160 [pdf, other]

Asymptotics of Random Feature Regression Beyond the Linear Scaling Regime

Authors: Hong Hu, Yue M. Lu, Theodor Misiakiewicz

Abstract: Recent advances in machine learning have been achieved by using overparametrized models trained until near interpolation of the training data. It was shown, e.g., through the double descent phenomenon, that the number of parameters is a poor proxy for the model complexity and generalization capabilities. This leaves open the question of understanding the impact of parametrization on the performanc… ▽ More Recent advances in machine learning have been achieved by using overparametrized models trained until near interpolation of the training data. It was shown, e.g., through the double descent phenomenon, that the number of parameters is a poor proxy for the model complexity and generalization capabilities. This leaves open the question of understanding the impact of parametrization on the performance of these models. How does model complexity and generalization depend on the number of parameters $p$? How should we choose $p$ relative to the sample size $n$ to achieve optimal test error? In this paper, we investigate the example of random feature ridge regression (RFRR). This model can be seen either as a finite-rank approximation to kernel ridge regression (KRR), or as a simplified model for neural networks trained in the so-called lazy regime. We consider covariates uniformly distributed on the $d$-dimensional sphere and compute sharp asymptotics for the RFRR test error in the high-dimensional polynomial scaling, where $p,n,d \to \infty$ while $p/ d^{κ_1}$ and $n / d^{κ_2}$ stay constant, for all $κ_1 , κ_2 \in \mathbb{R}_{>0}$. These asymptotics precisely characterize the impact of the number of random features and regularization parameter on the test performance. In particular, RFRR exhibits an intuitive trade-off between approximation and generalization power. For $n = o(p)$, the sample size $n$ is the bottleneck and RFRR achieves the same performance as KRR (which is equivalent to taking $p = \infty$). On the other hand, if $p = o(n)$, the number of random features $p$ is the limiting factor and RFRR test error matches the approximation error of the random feature model class (akin to taking $n = \infty$). Finally, a double descent appears at $n= p$, a phenomenon that was previously only characterized in the linear scaling $κ_1 = κ_2 = 1$. △ Less

Submitted 12 March, 2024; originally announced March 2024.

Comments: 106 pages, 8 figures

arXiv:2403.07952 [pdf, other]

AesopAgent: Agent-driven Evolutionary System on Story-to-Video Production

Authors: Jiuniu Wang, Zehua Du, Yuyuan Zhao, Bo Yuan, Kexiang Wang, Jian Liang, Yaxi Zhao, Yihen Lu, Gengliang Li, Junlong Gao, Xin Tu, Zhenyu Guo

Abstract: The Agent and AIGC (Artificial Intelligence Generated Content) technologies have recently made significant progress. We propose AesopAgent, an Agent-driven Evolutionary System on Story-to-Video Production. AesopAgent is a practical application of agent technology for multimodal content generation. The system integrates multiple generative capabilities within a unified framework, so that individual… ▽ More The Agent and AIGC (Artificial Intelligence Generated Content) technologies have recently made significant progress. We propose AesopAgent, an Agent-driven Evolutionary System on Story-to-Video Production. AesopAgent is a practical application of agent technology for multimodal content generation. The system integrates multiple generative capabilities within a unified framework, so that individual users can leverage these modules easily. This innovative system would convert user story proposals into scripts, images, and audio, and then integrate these multimodal contents into videos. Additionally, the animating units (e.g., Gen-2 and Sora) could make the videos more infectious. The AesopAgent system could orchestrate task workflow for video generation, ensuring that the generated video is both rich in content and coherent. This system mainly contains two layers, i.e., the Horizontal Layer and the Utility Layer. In the Horizontal Layer, we introduce a novel RAG-based evolutionary system that optimizes the whole video generation workflow and the steps within the workflow. It continuously evolves and iteratively optimizes workflow by accumulating expert experience and professional knowledge, including optimizing the LLM prompts and utilities usage. The Utility Layer provides multiple utilities, leading to consistent image generation that is visually coherent in terms of composition, characters, and style. Meanwhile, it provides audio and special effects, integrating them into expressive and logically arranged videos. Overall, our AesopAgent achieves state-of-the-art performance compared with many previous works in visual storytelling. Our AesopAgent is designed for convenient service for individual users, which is available on the following page: https://aesopai.github.io/. △ Less

Submitted 11 March, 2024; originally announced March 2024.

Comments: 22 pages, 13 figures

arXiv:2403.07874 [pdf, other]

Beyond Text: Frozen Large Language Models in Visual Signal Comprehension

Authors: Lei Zhu, Fangyun Wei, Yanye Lu

Abstract: In this work, we investigate the potential of a large language model (LLM) to directly comprehend visual signals without the necessity of fine-tuning on multi-modal datasets. The foundational concept of our method views an image as a linguistic entity, and translates it to a set of discrete words derived from the LLM's vocabulary. To achieve this, we present the Vision-to-Language Tokenizer, abbre… ▽ More In this work, we investigate the potential of a large language model (LLM) to directly comprehend visual signals without the necessity of fine-tuning on multi-modal datasets. The foundational concept of our method views an image as a linguistic entity, and translates it to a set of discrete words derived from the LLM's vocabulary. To achieve this, we present the Vision-to-Language Tokenizer, abbreviated as V2T Tokenizer, which transforms an image into a ``foreign language'' with the combined aid of an encoder-decoder, the LLM vocabulary, and a CLIP model. With this innovative image encoding, the LLM gains the ability not only for visual comprehension but also for image denoising and restoration in an auto-regressive fashion-crucially, without any fine-tuning. We undertake rigorous experiments to validate our method, encompassing understanding tasks like image recognition, image captioning, and visual question answering, as well as image denoising tasks like inpainting, outpainting, deblurring, and shift restoration. Code and models are available at https://github.com/zh460045050/V2L-Tokenizer. △ Less

Submitted 12 March, 2024; originally announced March 2024.

Comments: Accepted by CVPR 2024

arXiv:2403.07153 [pdf, other]

2023 Low-Power Computer Vision Challenge (LPCVC) Summary

Authors: Leo Chen, Benjamin Boardley, ** Hu, Yiru Wang, Yifan Pu, Xin **, Yongqiang Yao, Ruihao Gong, Bo Li, Gao Huang, Xianglong Liu, Zifu Wan, Xinwang Chen, Ning Liu, Ziyi Zhang, Dong** Liu, Ruijie Shan, Zheng** Che, Fachao Zhang, Xiaofeng Mou, Jian Tang, Maxim Chuprov, Ivan Malofeev, Alexander Goncharenko, Andrey Shcherbin , et al. (5 additional authors not shown)

Abstract: This article describes the 2023 IEEE Low-Power Computer Vision Challenge (LPCVC). Since 2015, LPCVC has been an international competition devoted to tackling the challenge of computer vision (CV) on edge devices. Most CV researchers focus on improving accuracy, at the expense of ever-growing sizes of machine models. LPCVC balances accuracy with resource requirements. Winners must achieve high accu… ▽ More This article describes the 2023 IEEE Low-Power Computer Vision Challenge (LPCVC). Since 2015, LPCVC has been an international competition devoted to tackling the challenge of computer vision (CV) on edge devices. Most CV researchers focus on improving accuracy, at the expense of ever-growing sizes of machine models. LPCVC balances accuracy with resource requirements. Winners must achieve high accuracy with short execution time when their CV solutions run on an embedded device, such as Raspberry PI or Nvidia Jetson Nano. The vision problem for 2023 LPCVC is segmentation of images acquired by Unmanned Aerial Vehicles (UAVs, also called drones) after disasters. The 2023 LPCVC attracted 60 international teams that submitted 676 solutions during the submission window of one month. This article explains the setup of the competition and highlights the winners' methods that improve accuracy and shorten execution time. △ Less

Submitted 11 March, 2024; originally announced March 2024.

Comments: LPCVC 2023, website: https://lpcv.ai/

arXiv:2403.06766 [pdf, other]

Determination of the number of $ψ(3686)$ events taken at BESIII

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (639 additional authors not shown)

Abstract: The number of $ψ(3686)$ events collected by the BESIII detector during the 2021 run period is determined to be $(2259.3\pm 11.1)\times 10^6$ by counting inclusive $ψ(3686)$ hadronic events. The uncertainty is systematic and the statistical uncertainty is negligible. Meanwhile, the numbers of $ψ(3686)$ events collected during the 2009 and 2012 run periods are updated to be… ▽ More The number of $ψ(3686)$ events collected by the BESIII detector during the 2021 run period is determined to be $(2259.3\pm 11.1)\times 10^6$ by counting inclusive $ψ(3686)$ hadronic events. The uncertainty is systematic and the statistical uncertainty is negligible. Meanwhile, the numbers of $ψ(3686)$ events collected during the 2009 and 2012 run periods are updated to be $(107.7\pm0.6)\times 10^6$ and $(345.4\pm 2.6)\times 10^6$, respectively. Both numbers are consistent with the previous measurements within one standard deviation. The total number of $ψ(3686)$ events in the three data samples is $(2712.4\pm14.3)\times10^6$. △ Less

Submitted 28 May, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

arXiv:2403.06148 [pdf, other]

OS-FPI: A Coarse-to-Fine One-Stream Network for UAV Geo-Localization

Authors: Jiahao Chen, Enhui Zheng, Ming Dai, Yifu Chen, Yusheng Lu

Abstract: The geo-localization and navigation technology of unmanned aerial vehicles (UAVs) in denied environments is currently a prominent research area. Prior approaches mainly employed a two-stream network with non-shared weights to extract features from UAV and satellite images separately, followed by related modeling to obtain the response map. However, the two-stream network extracts UAV and satellite… ▽ More The geo-localization and navigation technology of unmanned aerial vehicles (UAVs) in denied environments is currently a prominent research area. Prior approaches mainly employed a two-stream network with non-shared weights to extract features from UAV and satellite images separately, followed by related modeling to obtain the response map. However, the two-stream network extracts UAV and satellite features independently. This approach significantly affects the efficiency of feature extraction and increases the computational load. To address these issues, we propose a novel coarse-to-fine one-stream network (OS-FPI). Our approach allows information exchange between UAV and satellite features during early image feature extraction. To improve the model's performance, the framework retains feature maps generated at different stages of the feature extraction process for the feature fusion network, and establishes additional connections between UAV and satellite feature maps in the feature fusion network. Additionally, the framework introduces offset prediction to further refine and optimize the model's prediction results based on the classification tasks. Our proposed model, boasts a similar inference speed to FPI while significantly reducing the number of parameters. It can achieve better performance with fewer parameters under the same conditions. Moreover, it achieves state-of-the-art performance on the UL14 dataset. Compared to previous models, our model achieved a significant 10.92-point improvement on the RDS metric, reaching 76.25. Furthermore, its performance in meter-level localization accuracy is impressive, with 182.62% improvement in 3-meter accuracy, 164.17% improvement in 5-meter accuracy, and 137.43% improvement in 10-meter accuracy. △ Less

Submitted 10 March, 2024; originally announced March 2024.

arXiv:2403.06146 [pdf, ps, other]

Some combinatorial aspects of (q,2)-Fock space

Authors: Yungang Lu

Abstract: We introduce the (q,2)-Fock space over a given Hilbert space, calculate the explicit form of a product of the creation and annihilation operators acting on the vacuum vector, demonstrate that this explicit form involves a specific subset of the set of all pair partitions, and provide a detailed characterization of this subset. We introduce the (q,2)-Fock space over a given Hilbert space, calculate the explicit form of a product of the creation and annihilation operators acting on the vacuum vector, demonstrate that this explicit form involves a specific subset of the set of all pair partitions, and provide a detailed characterization of this subset. △ Less

Submitted 10 March, 2024; originally announced March 2024.

Comments: arXiv admin note: text overlap with arXiv:2402.19447

arXiv:2403.06028 [pdf, other]

Fully discretized Sobolev gradient flow for the Gross-Pitaevskii eigenvalue problem

Authors: Ziang Chen, Jianfeng Lu, Yulong Lu, Xiangxiong Zhang

Abstract: For the ground state of the Gross-Pitaevskii (GP) eigenvalue problem, we consider a fully discretized Sobolev gradient flow, which can be regarded as the Riemannian gradient descent on the sphere under a metric induced by a modified $H^1$-norm. We prove its global convergence to a critical point of the discrete GP energy and its local exponential convergence to the ground state of the discrete GP… ▽ More For the ground state of the Gross-Pitaevskii (GP) eigenvalue problem, we consider a fully discretized Sobolev gradient flow, which can be regarded as the Riemannian gradient descent on the sphere under a metric induced by a modified $H^1$-norm. We prove its global convergence to a critical point of the discrete GP energy and its local exponential convergence to the ground state of the discrete GP energy. The local exponential convergence rate depends on the eigengap of the discrete GP energy. When the discretization is the classical second-order finite difference in two dimensions, such an eigengap can be further proven to be mesh independent, i.e., it has a uniform positive lower bound, thus the local exponential convergence rate is mesh independent. Numerical experiments with discretization by high order $Q^k$ spectral element methods in two and three dimensions are provided to validate the efficiency of the proposed method. △ Less

Submitted 9 March, 2024; originally announced March 2024.

arXiv:2403.05918 [pdf]

SEMRes-DDPM: Residual Network Based Diffusion Modelling Applied to Imbalanced Data

Authors: Ming Zheng, Yang Yang, Zhi-Hang Zhao, Shan-Chao Gan, Yang Chen, Si-Kai Ni, Yang Lu

Abstract: In the field of data mining and machine learning, commonly used classification models cannot effectively learn in unbalanced data. In order to balance the data distribution before model training, oversampling methods are often used to generate data for a small number of classes to solve the problem of classifying unbalanced data. Most of the classical oversampling methods are based on the SMOTE te… ▽ More In the field of data mining and machine learning, commonly used classification models cannot effectively learn in unbalanced data. In order to balance the data distribution before model training, oversampling methods are often used to generate data for a small number of classes to solve the problem of classifying unbalanced data. Most of the classical oversampling methods are based on the SMOTE technique, which only focuses on the local information of the data, and therefore the generated data may have the problem of not being realistic enough. In the current oversampling methods based on generative networks, the methods based on GANs can capture the true distribution of data, but there is the problem of pattern collapse and training instability in training; in the oversampling methods based on denoising diffusion probability models, the neural network of the inverse diffusion process using the U-Net is not applicable to tabular data, and although the MLP can be used to replace the U-Net, the problem exists due to the simplicity of the structure and the poor effect of removing noise. problem of poor noise removal. In order to overcome the above problems, we propose a novel oversampling method SEMRes-DDPM.In the SEMRes-DDPM backward diffusion process, a new neural network structure SEMST-ResNet is used, which is suitable for tabular data and has good noise removal effect, and it can generate tabular data with higher quality. Experiments show that the SEMResNet network removes noise better than MLP; SEMRes-DDPM generates data distributions that are closer to the real data distributions than TabDDPM with CWGAN-GP; on 20 real unbalanced tabular datasets with 9 classification models, SEMRes-DDPM improves the quality of the generated tabular data in terms of three evaluation metrics (F1, G-mean, AUC) with better classification performance than other SOTA oversampling methods. △ Less

Submitted 11 March, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

Comments: None

arXiv:2403.05831 [pdf, other]

BRDF-Based Photometric Modeling of LEO Constellation Satellite from Massive Observations

Authors: Yao Lu

Abstract: Modeling the brightness of satellites in large Low-Earth Orbit (LEO) constellations can not only assist the astronomical community in assessing the impact of reflected light from satellites, optimizing observing schedules and guiding data processing, but also motivate satellite operators to improve their satellite designs, thus facilitating cooperation and consensus among different stakeholders. T… ▽ More Modeling the brightness of satellites in large Low-Earth Orbit (LEO) constellations can not only assist the astronomical community in assessing the impact of reflected light from satellites, optimizing observing schedules and guiding data processing, but also motivate satellite operators to improve their satellite designs, thus facilitating cooperation and consensus among different stakeholders. This work presents a photometric model of the Starlink satellites based on the Bidirectional Reflectance Distribution Function (BRDF) using millions of photometric observations. To enhance model accuracy and computational efficiency, data filtering and reduction are employed, and chassis blocking on the solar array and the earthshine effect are taken into account. The assumptions of the model are also validated by showing that the satellite attitude is as expected, the solar array is nearly perpendicular to the chassis, and both the solar array pseudo-specular reflection and the chassis earthshine should be included in the model. Reflectance characteristics of the satellites and the apparent magnitude distributions over station are finally discussed based on the photometric predictions from the model. In addition to assessing the light pollution and guiding the development of response measures, accurate photometric models of satellites can also play an important role in areas such as space situational awareness. △ Less

Submitted 15 May, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

Comments: 16 pages, 9 figures

arXiv:2403.05416 [pdf, other]

SIRST-5K: Exploring Massive Negatives Synthesis with Self-supervised Learning for Robust Infrared Small Target Detection

Authors: Yahao Lu, Yupei Lin, Han Wu, Xiaoyu Xian, Yukai Shi, Liang Lin

Abstract: Single-frame infrared small target (SIRST) detection aims to recognize small targets from clutter backgrounds. Recently, convolutional neural networks have achieved significant advantages in general object detection. With the development of Transformer, the scale of SIRST models is constantly increasing. Due to the limited training samples, performance has not been improved accordingly. The qualit… ▽ More Single-frame infrared small target (SIRST) detection aims to recognize small targets from clutter backgrounds. Recently, convolutional neural networks have achieved significant advantages in general object detection. With the development of Transformer, the scale of SIRST models is constantly increasing. Due to the limited training samples, performance has not been improved accordingly. The quality, quantity, and diversity of the infrared dataset are critical to the detection of small targets. To highlight this issue, we propose a negative sample augmentation method in this paper. Specifically, a negative augmentation approach is proposed to generate massive negatives for self-supervised learning. Firstly, we perform a sequential noise modeling technology to generate realistic infrared data. Secondly, we fuse the extracted noise with the original data to facilitate diversity and fidelity in the generated data. Lastly, we proposed a negative augmentation strategy to enrich diversity as well as maintain semantic invariance. The proposed algorithm produces a synthetic SIRST-5K dataset, which contains massive pseudo-data and corresponding labels. With a rich diversity of infrared small target data, our algorithm significantly improves the model performance and convergence speed. Compared with other state-of-the-art (SOTA) methods, our method achieves outstanding performance in terms of probability of detection (Pd), false-alarm rate (Fa), and intersection over union (IoU). △ Less

Submitted 8 March, 2024; originally announced March 2024.

Comments: We address the quality, quantity, and diversity of the infrared data in SIRST, the dataset is available at: https://github.com/luy0222/SIRST-5K

arXiv:2403.05090 [pdf, other]

OCEAN: An Openspace Collision-free Trajectory Planner for Autonomous Parking Based on ADMM

Authors: Dongxu Wang, Yanbin Lu, Weilong Liu, Hao Zuo, Jiade Xin, Xiang Long, Yuncheng Jiang

Abstract: In this paper, we propose an Openspace Collision-freE trAjectory plaNner (OCEAN) for autonomous parking. OCEAN is an optimization-based trajectory planner accelerated by Alternating Direction Method of Multiplier (ADMM) with enhanced computational efficiency and robustness, and is suitable for all scenes with few dynamic obstacles. Starting from a hierarchical optimization-based collision avoidanc… ▽ More In this paper, we propose an Openspace Collision-freE trAjectory plaNner (OCEAN) for autonomous parking. OCEAN is an optimization-based trajectory planner accelerated by Alternating Direction Method of Multiplier (ADMM) with enhanced computational efficiency and robustness, and is suitable for all scenes with few dynamic obstacles. Starting from a hierarchical optimization-based collision avoidance framework, the trajectory planning problem is first warm-started by a collision-free Hybrid A* trajectory, then the collision avoidance trajectory planning problem is reformulated as a smooth and convex dual form, and solved by ADMM in parallel. The optimization variables are carefully split into several groups so that ADMM sub-problems are formulated as Quadratic Programming (QP), Sequential Quadratic Programming (SQP),and Second Order Cone Programming (SOCP) problems that can be efficiently and robustly solved. We validate our method both in hundreds of simulation scenarios and hundreds of hours of public parking areas. The results show that the proposed method has better system performance compared with other benchmarks. △ Less

Submitted 8 March, 2024; originally announced March 2024.

Comments: 8 pages,5 figures

arXiv:2403.04549 [pdf, other]

Explainable Face Verification via Feature-Guided Gradient Backpropagation

Authors: Yuhang Lu, Zewei Xu, Touradj Ebrahimi

Abstract: Recent years have witnessed significant advancement in face recognition (FR) techniques, with their applications widely spread in people's lives and security-sensitive areas. There is a growing need for reliable interpretations of decisions of such systems. Existing studies relying on various mechanisms have investigated the usage of saliency maps as an explanation approach, but suffer from differ… ▽ More Recent years have witnessed significant advancement in face recognition (FR) techniques, with their applications widely spread in people's lives and security-sensitive areas. There is a growing need for reliable interpretations of decisions of such systems. Existing studies relying on various mechanisms have investigated the usage of saliency maps as an explanation approach, but suffer from different limitations. This paper first explores the spatial relationship between face image and its deep representation via gradient backpropagation. Then a new explanation approach FGGB has been conceived, which provides precise and insightful similarity and dissimilarity saliency maps to explain the "Accept" and "Reject" decision of an FR system. Extensive visual presentation and quantitative measurement have shown that FGGB achieves superior performance in both similarity and dissimilarity maps when compared to current state-of-the-art explainable face verification approaches. △ Less

Submitted 7 March, 2024; originally announced March 2024.

arXiv:2403.04484 [pdf, other]

Source Matters: Source Dataset Impact on Model Robustness in Medical Imaging

Authors: Dovile Juodelyte, Yucheng Lu, Amelia Jiménez-Sánchez, Sabrina Bottazzi, Enzo Ferrante, Veronika Cheplygina

Abstract: Transfer learning has become an essential part of medical imaging classification algorithms, often leveraging ImageNet weights. However, the domain shift from natural to medical images has prompted alternatives such as RadImageNet, often demonstrating comparable classification performance. However, it remains unclear whether the performance gains from transfer learning stem from improved generaliz… ▽ More Transfer learning has become an essential part of medical imaging classification algorithms, often leveraging ImageNet weights. However, the domain shift from natural to medical images has prompted alternatives such as RadImageNet, often demonstrating comparable classification performance. However, it remains unclear whether the performance gains from transfer learning stem from improved generalization or shortcut learning. To address this, we investigate potential confounders -- whether synthetic or sampled from the data -- across two publicly available chest X-ray and CT datasets. We show that ImageNet and RadImageNet achieve comparable classification performance, yet ImageNet is much more prone to overfitting to confounders. We recommend that researchers using ImageNet-pretrained models reexamine their model robustness by conducting similar experiments. Our code and experiments are available at https://github.com/DovileDo/source-matters. △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: Submitted to MICCAI 2024

arXiv:2403.04459 [pdf, ps, other]

An efficient method for calculating resonant modes in biperiodic photonic structures

Authors: Nan Zhang, Ya Yan Lu

Abstract: Many photonic devices, such as photonic crystal slabs, cross gratings, and periodic metasurfaces, are biperiodic structures with two independent periodic directions, and are sandwiched between two homogeneous media. Many applications of these devices are closely related to resonance phenomena. Therefore, efficient computation of resonant modes is crucial in device design and structure analysis. Si… ▽ More Many photonic devices, such as photonic crystal slabs, cross gratings, and periodic metasurfaces, are biperiodic structures with two independent periodic directions, and are sandwiched between two homogeneous media. Many applications of these devices are closely related to resonance phenomena. Therefore, efficient computation of resonant modes is crucial in device design and structure analysis. Since resonant modes satisfy outgoing radiation conditions, perfectly matched layers (PMLs) are usually used to truncate the unbounded spatial variable perpendicular to the periodic directions. In this paper, we develop an efficient method without using PMLs to calculate resonant modes in biperiodic structures. We reduce the original eigenvalue problem to a small matrix nonlinear eigenvalue problem which is solved by the contour integral method. Numerical examples show that our method is efficient with respect to memory usage and CPU time, free of spurious solutions, and determines degenerate resonant modes without any difficulty. △ Less

Submitted 7 March, 2024; originally announced March 2024.

arXiv:2403.03853 [pdf, other]

ShortGPT: Layers in Large Language Models are More Redundant Than You Expect

Authors: Xin Men, Mingyu Xu, Qingyu Zhang, Bingning Wang, Hongyu Lin, Yaojie Lu, Xianpei Han, Weipeng Chen

Abstract: As Large Language Models (LLMs) continue to advance in performance, their size has escalated significantly, with current LLMs containing billions or even trillions of parameters. However, in this study, we discovered that many layers of LLMs exhibit high similarity, and some layers play a negligible role in network functionality. Based on this observation, we define a metric called Block Influence… ▽ More As Large Language Models (LLMs) continue to advance in performance, their size has escalated significantly, with current LLMs containing billions or even trillions of parameters. However, in this study, we discovered that many layers of LLMs exhibit high similarity, and some layers play a negligible role in network functionality. Based on this observation, we define a metric called Block Influence (BI) to gauge the significance of each layer in LLMs. We then propose a straightforward pruning approach: layer removal, in which we directly delete the redundant layers in LLMs based on their BI scores. Experiments demonstrate that our method, which we call ShortGPT, significantly outperforms previous state-of-the-art (SOTA) methods in model pruning. Moreover, ShortGPT is orthogonal to quantization-like methods, enabling further reduction in parameters and computation. The ability to achieve better results through simple layer removal, as opposed to more complex pruning techniques, suggests a high degree of redundancy in the model architecture. △ Less

Submitted 7 March, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

arXiv:2403.03655 [pdf, other]

Kronos: A Secure and Generic Sharding Blockchain Consensus with Optimized Overhead

Authors: Yizhong Liu, Andi Liu, Yuan Lu, Zhuocheng Pan, Yinuo Li, Jianwei Liu, Song Bian, Mauro Conti

Abstract: Sharding enhances blockchain scalability by dividing the network into shards, each managing specific unspent transaction outputs or accounts. As an introduced new transaction type, cross-shard transactions pose a critical challenge to the security and efficiency of sharding blockchains. Currently, there is a lack of a generic sharding consensus pattern that achieves both security and low overhead.… ▽ More Sharding enhances blockchain scalability by dividing the network into shards, each managing specific unspent transaction outputs or accounts. As an introduced new transaction type, cross-shard transactions pose a critical challenge to the security and efficiency of sharding blockchains. Currently, there is a lack of a generic sharding consensus pattern that achieves both security and low overhead. In this paper, we present Kronos, a secure sharding blockchain consensus achieving optimized overhead. In particular, we propose a new secure sharding consensus pattern, based on a buffer managed jointly by shard members. Valid transactions are transferred to the payee via the buffer, while invalid ones are rejected through happy or unhappy paths. Kronos is proved to achieve security with atomicity under malicious clients with optimal intra-shard overhead $kB$ ($k$ for involved shard number and $B$ for a Byzantine fault tolerance (BFT) cost). Besides, we propose secure cross-shard certification methods based on batch certification and reliable cross-shard transfer. The former combines hybrid trees or vector commitments, while the latter integrates erasure coding. Handling $b$ transactions, Kronos is proved to achieve reliability with low cross-shard overhead $O(n b λ)$ ($n$ for shard size and $λ$ for the security parameter). Notably, Kronos imposes no restrictions on BFT and does not rely on time assumptions, offering optional constructions in various modules. We implement Kronos using two prominent BFT protocols: asynchronous Speeding Dumbo and partial synchronous Hotstuff. Extensive experiments demonstrate Kronos scales the consensus nodes to thousands, achieving a substantial throughput of 320 ktx/sec with 2.0 sec latency. Compared with the past solutions, Kronos outperforms, achieving up to a 12* improvement in throughput and a 50% reduction in latency. △ Less

Submitted 5 May, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

arXiv:2403.03500 [pdf, other]

Observation of the decay $h_{c}\to3(π^{+}π^{-})π^{0}$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (639 additional authors not shown)

Abstract: Based on $(2712.4\pm14.1)\times10^{6}$ $ψ(3686)$ events collected with the BESIII detector, we study the decays $h_{c}\to3(π^{+}π^{-})π^{0}$, $h_{c}\to2(π^{+}π^{-})ω$, $h_{c}\to2(π^{+}π^{-})π^{0}η$, $h_{c}\to2(π^{+}π^{-})η$, and $h_{c}\to p\bar{p}$ via $ψ(3686)\toπ^{0}h_{c}$. The decay channel $h_{c}\to3(π^{+}π^{-})π^{0}$ is observed for the first time, and its branching fraction is determined to… ▽ More Based on $(2712.4\pm14.1)\times10^{6}$ $ψ(3686)$ events collected with the BESIII detector, we study the decays $h_{c}\to3(π^{+}π^{-})π^{0}$, $h_{c}\to2(π^{+}π^{-})ω$, $h_{c}\to2(π^{+}π^{-})π^{0}η$, $h_{c}\to2(π^{+}π^{-})η$, and $h_{c}\to p\bar{p}$ via $ψ(3686)\toπ^{0}h_{c}$. The decay channel $h_{c}\to3(π^{+}π^{-})π^{0}$ is observed for the first time, and its branching fraction is determined to be $\left( {9.28\pm 1.14 \pm 0.77} \right) \times {10^{ - 3}}$, where the first uncertainty is statistical and the second is systematic. In addition, first evidence is found for the modes $h_{c} \to 2(π^{+}π^{-})π^{0}η$ and $h_{c}\to2(π^{+}π^{-})ω$ with significances of 4.8$σ$ and 4.7$σ$, and their branching fractions are determined to be $(7.55\pm1.51\pm0.77)\times10^{-3}$ and $\left( {4.00 \pm 0.86 \pm 0.35}\right) \times {10^{ - 3}}$, respectively. No significant signals of $h_c\to 2(π^+π^-)η$ and $h_{c}\to p\bar{p}$ are observed, and the upper limits of the branching fractions of these decays are determined to be $<6.19\times10^{-4}$ and $<4.40\times10^{-5}$ at the 90% confidence level, respectively. △ Less

Submitted 6 March, 2024; originally announced March 2024.

Comments: 11 pages, 3 figures

arXiv:2403.02819 [pdf, other]

Unraveling spin texture and spin-orbit coupling contributions in spin triplet superconductivity

Authors: Pablo Tuero, César González-Ruano, Yuan Lu, Coriolan Tiusan, Farkhad G. Aliev

Abstract: Over the past decade, it has been proposed theoretically and confirmed experimentally that long-range spin triplet (LRT) superconductivity can be generated in ferromagnet-superconductor hybrids either by the presence of spin textures (ST-LRT) or thanks to spin-orbit coupling (SOC-LRT). Nevertheless, there has been no theoretical or experimental investigation to date suggesting that both contributi… ▽ More Over the past decade, it has been proposed theoretically and confirmed experimentally that long-range spin triplet (LRT) superconductivity can be generated in ferromagnet-superconductor hybrids either by the presence of spin textures (ST-LRT) or thanks to spin-orbit coupling (SOC-LRT). Nevertheless, there has been no theoretical or experimental investigation to date suggesting that both contributions could simultaneously exist within an experimental system. To disentangle these contributions, we present a comprehensive study of superconducting quasiparticle interference effects taking place inside a ferromagnetic layer interfacing a superconductor, through the investigation of above-gap conductance anomalies (CAs) related to MacMillan-Rowell resonances. The bias dependence of the CAs has been studied under a wide range of in-plane (IP) and out-of-plane (OOP) magnetic fields in two types of epitaxial, V/MgO/Fe-based ferromagnet-superconductor junctions with interfacial spin-orbit coupling. We observe an anisotropy of the CAs amplitude under small IP and OOP magnetic fields while remaining weakly affected by high fields, and implement micromagnetic simulations to help us distinguish between the ST-LRT and SOC-LRT contributions. Our findings suggest that further exploration of Fabry-Pérot type interference effects in electron transport could yield valuable insights into the hybridization between superconductors and ferromagnets induced by spin-orbit coupling and spin textures. △ Less

Submitted 7 March, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

Comments: Submitted for Peer review

arXiv:2403.02345 [pdf, ps, other]

Some Probabilistic aspects of (q,2)-Fock space

Authors: Yungang Lu

Abstract: This paper primarily focuses on the investigation of the distribution of certain crucial operators with respect to significant states on the (q,2)-Fock space, for instance, the vacuum distribution of the field operator. This paper primarily focuses on the investigation of the distribution of certain crucial operators with respect to significant states on the (q,2)-Fock space, for instance, the vacuum distribution of the field operator. △ Less

Submitted 29 February, 2024; originally announced March 2024.

arXiv:2403.01773 [pdf, other]

Improving out-of-distribution generalization in graphs via hierarchical semantic environments

Authors: Yinhua Piao, Sangseon Lee, Yi**gxiu Lu, Sun Kim

Abstract: Out-of-distribution (OOD) generalization in the graph domain is challenging due to complex distribution shifts and a lack of environmental contexts. Recent methods attempt to enhance graph OOD generalization by generating flat environments. However, such flat environments come with inherent limitations to capture more complex data distributions. Considering the DrugOOD dataset, which contains dive… ▽ More Out-of-distribution (OOD) generalization in the graph domain is challenging due to complex distribution shifts and a lack of environmental contexts. Recent methods attempt to enhance graph OOD generalization by generating flat environments. However, such flat environments come with inherent limitations to capture more complex data distributions. Considering the DrugOOD dataset, which contains diverse training environments (e.g., scaffold, size, etc.), flat contexts cannot sufficiently address its high heterogeneity. Thus, a new challenge is posed to generate more semantically enriched environments to enhance graph invariant learning for handling distribution shifts. In this paper, we propose a novel approach to generate hierarchical semantic environments for each graph. Firstly, given an input graph, we explicitly extract variant subgraphs from the input graph to generate proxy predictions on local environments. Then, stochastic attention mechanisms are employed to re-extract the subgraphs for regenerating global environments in a hierarchical manner. In addition, we introduce a new learning objective that guides our model to learn the diversity of environments within the same hierarchy while maintaining consistency across different hierarchies. This approach enables our model to consider the relationships between environments and facilitates robust graph invariant learning. Extensive experiments on real-world graph data have demonstrated the effectiveness of our framework. Particularly, in the challenging dataset DrugOOD, our method achieves up to 1.29% and 2.83% improvement over the best baselines on IC50 and EC50 prediction tasks, respectively. △ Less

Submitted 3 June, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

Comments: CVPR 2024

arXiv:2403.01761 [pdf, other]

Observation of $ψ(3686)\to 3φ$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (645 additional authors not shown)

Abstract: Using $(2.712\pm0.014)\times 10^9$ $ψ(3686)$ events collected by the BESIII detector operating at the BEPCII collider, we report the first observation of $ψ(3686)\to 3φ$ decay with a significance larger than 10$σ$. The branching fraction of this decay is determined to be $(1.46\pm0.05\pm0.17)\times10^{-5}$, where the first uncertainty is statistical and the second is systematic. No significant str… ▽ More Using $(2.712\pm0.014)\times 10^9$ $ψ(3686)$ events collected by the BESIII detector operating at the BEPCII collider, we report the first observation of $ψ(3686)\to 3φ$ decay with a significance larger than 10$σ$. The branching fraction of this decay is determined to be $(1.46\pm0.05\pm0.17)\times10^{-5}$, where the first uncertainty is statistical and the second is systematic. No significant structure is observed in the $φφ$ invariant mass spectra. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2403.01731 [pdf, other]

RISeg: Robot Interactive Object Segmentation via Body Frame-Invariant Features

Authors: Howard H. Qian, Yangxiao Lu, Kejia Ren, Gaotian Wang, Ninad Khargonkar, Yu Xiang, Kaiyu Hang

Abstract: In order to successfully perform manipulation tasks in new environments, such as gras**, robots must be proficient in segmenting unseen objects from the background and/or other objects. Previous works perform unseen object instance segmentation (UOIS) by training deep neural networks on large-scale data to learn RGB/RGB-D feature embeddings, where cluttered environments often result in inaccurat… ▽ More In order to successfully perform manipulation tasks in new environments, such as gras**, robots must be proficient in segmenting unseen objects from the background and/or other objects. Previous works perform unseen object instance segmentation (UOIS) by training deep neural networks on large-scale data to learn RGB/RGB-D feature embeddings, where cluttered environments often result in inaccurate segmentations. We build upon these methods and introduce a novel approach to correct inaccurate segmentation, such as under-segmentation, of static image-based UOIS masks by using robot interaction and a designed body frame-invariant feature. We demonstrate that the relative linear and rotational velocities of frames randomly attached to rigid bodies due to robot interactions can be used to identify objects and accumulate corrected object-level segmentation masks. By introducing motion to regions of segmentation uncertainty, we are able to drastically improve segmentation accuracy in an uncertainty-driven manner with minimal, non-disruptive interactions (ca. 2-3 per scene). We demonstrate the effectiveness of our proposed interactive perception pipeline in accurately segmenting cluttered scenes by achieving an average object segmentation accuracy rate of 80.7%, an increase of 28.2% when compared with other state-of-the-art UOIS methods. △ Less

Submitted 4 March, 2024; originally announced March 2024.

Comments: 7 pages, 5 figures, ICRA 2024

arXiv:2403.01677 [pdf, other]

Elliptic anisotropy of open-charm hadron in p--Pb collisions at LHC energies from parton scatterings

Authors: Siyu Tang, Yuan Lu, Chao Zhang, RenZhuo Wan

Abstract: The elliptic azimuthal anisotropy coefficient ($v_{2}$) of open-charm hadron at midrapidity ($|η<1|$) was studied in p--Pb collisions at $\sqrt{s_{\mathrm{NN}}}=$ 8.16 TeV using a multi-phase transport model (AMPT). By implementing an additional heavy quark--antiquark pair production trigger in the AMPT, we obtained a simultaneously description of the $p_{\mathrm{T}}$ spectrum and $v_{2}$ of… ▽ More The elliptic azimuthal anisotropy coefficient ($v_{2}$) of open-charm hadron at midrapidity ($|η<1|$) was studied in p--Pb collisions at $\sqrt{s_{\mathrm{NN}}}=$ 8.16 TeV using a multi-phase transport model (AMPT). By implementing an additional heavy quark--antiquark pair production trigger in the AMPT, we obtained a simultaneously description of the $p_{\mathrm{T}}$ spectrum and $v_{2}$ of $D^{0}$ meson. Then the predictions for the $v_{2}$ of charm hadrons including $D^{+}$, $D_{s}^{+}$ and $Λ_{c}^{+}$ in p--Pb collisions are provided for the first time. We found that the $v_{2}$ of open-charm hadron follows the number-of-constituent-quark (NCQ) scaling in high-multiplicity p--Pb collisions, and is significantly affected by the parton scattering process. These findings further demonstrate the importance of partonic degrees of freedom in small collision systems for heavy flavors, and provide referential value for future measurements of azimuthal anisotropy at the LHC energies. △ Less

Submitted 3 March, 2024; originally announced March 2024.

Comments: 6 pages, 5 figures, submitted to Physical Review C

arXiv:2403.01414 [pdf, other]

Unsigned Orthogonal Distance Fields: An Accurate Neural Implicit Representation for Diverse 3D Shapes

Authors: Yujie Lu, Long Wan, Nayu Ding, Yulong Wang, Shuhan Shen, Shen Cai, Lin Gao

Abstract: Neural implicit representation of geometric shapes has witnessed considerable advancements in recent years. However, common distance field based implicit representations, specifically signed distance field (SDF) for watertight shapes or unsigned distance field (UDF) for arbitrary shapes, routinely suffer from degradation of reconstruction accuracy when converting to explicit surface points and mes… ▽ More Neural implicit representation of geometric shapes has witnessed considerable advancements in recent years. However, common distance field based implicit representations, specifically signed distance field (SDF) for watertight shapes or unsigned distance field (UDF) for arbitrary shapes, routinely suffer from degradation of reconstruction accuracy when converting to explicit surface points and meshes. In this paper, we introduce a novel neural implicit representation based on unsigned orthogonal distance fields (UODFs). In UODFs, the minimal unsigned distance from any spatial point to the shape surface is defined solely in one orthogonal direction, contrasting with the multi-directional determination made by SDF and UDF. Consequently, every point in the 3D UODFs can directly access its closest surface points along three orthogonal directions. This distinctive feature leverages the accurate reconstruction of surface points without interpolation errors. We verify the effectiveness of UODFs through a range of reconstruction examples, extending from simple watertight or non-watertight shapes to complex shapes that include hollows, internal or assembling structures. △ Less

Submitted 1 April, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

Comments: accepted by CVPR 2024

arXiv:2403.00982 [pdf, other]

LocalRQA: From Generating Data to Locally Training, Testing, and Deploying Retrieval-Augmented QA Systems

Authors: Xiao Yu, Yunan Lu, Zhou Yu

Abstract: Retrieval-augmented question-answering systems combine retrieval techniques with large language models to provide answers that are more accurate and informative. Many existing toolkits allow users to quickly build such systems using off-the-shelf models, but they fall short in supporting researchers and developers to customize the model training, testing, and deployment process. We propose LocalRQ… ▽ More Retrieval-augmented question-answering systems combine retrieval techniques with large language models to provide answers that are more accurate and informative. Many existing toolkits allow users to quickly build such systems using off-the-shelf models, but they fall short in supporting researchers and developers to customize the model training, testing, and deployment process. We propose LocalRQA, an open-source toolkit that features a wide selection of model training algorithms, evaluation methods, and deployment tools curated from the latest research. As a showcase, we build QA systems using online documentation obtained from Databricks and Faire's websites. We find 7B-models trained and deployed using LocalRQA reach a similar performance compared to using OpenAI's text-ada-002 and GPT-4-turbo. △ Less

Submitted 1 March, 2024; originally announced March 2024.

arXiv:2403.00801 [pdf, other]

Self-Retrieval: Building an Information Retrieval System with One Large Language Model

Authors: Qiaoyu Tang, Jiawei Chen, Bowen Yu, Yaojie Lu, Cheng Fu, Haiyang Yu, Hongyu Lin, Fei Huang, Ben He, Xianpei Han, Le Sun, Yongbin Li

Abstract: The rise of large language models (LLMs) has transformed the role of information retrieval (IR) systems in the way to humans accessing information. Due to the isolated architecture and the limited interaction, existing IR systems are unable to fully accommodate the shift from directly providing information to humans to indirectly serving large language models. In this paper, we propose Self-Retrie… ▽ More The rise of large language models (LLMs) has transformed the role of information retrieval (IR) systems in the way to humans accessing information. Due to the isolated architecture and the limited interaction, existing IR systems are unable to fully accommodate the shift from directly providing information to humans to indirectly serving large language models. In this paper, we propose Self-Retrieval, an end-to-end, LLM-driven information retrieval architecture that can fully internalize the required abilities of IR systems into a single LLM and deeply leverage the capabilities of LLMs during IR process. Specifically, Self-retrieval internalizes the corpus to retrieve into a LLM via a natural language indexing architecture. Then the entire retrieval process is redefined as a procedure of document generation and self-assessment, which can be end-to-end executed using a single large language model. Experimental results demonstrate that Self-Retrieval not only significantly outperforms previous retrieval approaches by a large margin, but also can significantly boost the performance of LLM-driven downstream applications like retrieval augumented generation. △ Less

Submitted 23 February, 2024; originally announced March 2024.

arXiv:2403.00795 [pdf, other]

Executing Natural Language-Described Algorithms with Large Language Models: An Investigation

Authors: Xin Zheng, Qiming Zhu, Hongyu Lin, Yaojie Lu, Xianpei Han, Le Sun

Abstract: Executing computer programs described in natural language has long been a pursuit of computer science. With the advent of enhanced natural language understanding capabilities exhibited by large language models (LLMs), the path toward this goal has been illuminated. In this paper, we seek to examine the capacity of present-day LLMs to comprehend and execute algorithms outlined in natural language.… ▽ More Executing computer programs described in natural language has long been a pursuit of computer science. With the advent of enhanced natural language understanding capabilities exhibited by large language models (LLMs), the path toward this goal has been illuminated. In this paper, we seek to examine the capacity of present-day LLMs to comprehend and execute algorithms outlined in natural language. We established an algorithm test set sourced from Introduction to Algorithm, a well-known textbook that contains many representative widely-used algorithms. To systematically assess LLMs' code execution abilities, we selected 30 algorithms, generated 300 random-sampled instances in total, and evaluated whether popular LLMs can understand and execute these algorithms. Our findings reveal that LLMs, notably GPT-4, can effectively execute programs described in natural language, as long as no heavy numeric computation is involved. We believe our findings contribute to evaluating LLMs' code execution abilities and would encourage further investigation and application for the computation power of LLMs. △ Less

Submitted 14 March, 2024; v1 submitted 23 February, 2024; originally announced March 2024.

Comments: Accepted at LREC-COLING 2024

arXiv:2403.00327 [pdf, other]

Task Indicating Transformer for Task-conditional Dense Predictions

Authors: Yuxiang Lu, Shalayiding Sirejiding, Bayram Bayramli, Suizhi Huang, Yue Ding, Hongtao Lu

Abstract: The task-conditional model is a distinctive stream for efficient multi-task learning. Existing works encounter a critical limitation in learning task-agnostic and task-specific representations, primarily due to shortcomings in global context modeling arising from CNN-based architectures, as well as a deficiency in multi-scale feature interaction within the decoder. In this paper, we introduce a no… ▽ More The task-conditional model is a distinctive stream for efficient multi-task learning. Existing works encounter a critical limitation in learning task-agnostic and task-specific representations, primarily due to shortcomings in global context modeling arising from CNN-based architectures, as well as a deficiency in multi-scale feature interaction within the decoder. In this paper, we introduce a novel task-conditional framework called Task Indicating Transformer (TIT) to tackle this challenge. Our approach designs a Mix Task Adapter module within the transformer block, which incorporates a Task Indicating Matrix through matrix decomposition, thereby enhancing long-range dependency modeling and parameter-efficient feature adaptation by capturing intra- and inter-task features. Moreover, we propose a Task Gate Decoder module that harnesses a Task Indicating Vector and gating mechanism to facilitate adaptive multi-scale feature refinement guided by task embeddings. Experiments on two public multi-task dense prediction benchmarks, NYUD-v2 and PASCAL-Context, demonstrate that our approach surpasses state-of-the-art task-conditional methods. △ Less

Submitted 1 March, 2024; originally announced March 2024.

Comments: Accepted by ICASSP 2024

arXiv:2403.00275 [pdf, other]

Crosstalk-Robust Quantum Control in Multimode Bosonic Systems

Authors: Xinyuan You, Yunwei Lu, Taeyoon Kim, Doga Murat Kurkcuoglu, Shaojiang Zhu, David van Zanten, Tanay Roy, Yao Lu, Srivatsan Chakram, Anna Grassellino, Alexander Romanenko, Jens Koch, Silvia Zorzetti

Abstract: High-coherence superconducting cavities offer a hardware-efficient platform for quantum information processing. To achieve universal operations of these bosonic modes, the requisite nonlinearity is realized by coupling them to a transmon ancilla. However, this configuration is susceptible to crosstalk errors in the dispersive regime, where the ancilla frequency is Stark-shifted by the state of eac… ▽ More High-coherence superconducting cavities offer a hardware-efficient platform for quantum information processing. To achieve universal operations of these bosonic modes, the requisite nonlinearity is realized by coupling them to a transmon ancilla. However, this configuration is susceptible to crosstalk errors in the dispersive regime, where the ancilla frequency is Stark-shifted by the state of each coupled bosonic mode. This leads to a frequency mismatch of the ancilla drive, lowering the gate fidelities. To mitigate such coherent errors, we employ quantum optimal control to engineer ancilla pulses that are robust to the frequency shifts. These optimized pulses are subsequently integrated into a recently developed echoed conditional displacement (ECD) protocol for executing single- and two-mode operations. Through numerical simulations, we examine two representative scenarios: the preparation of single-mode Fock states in the presence of spectator modes and the generation of two-mode entangled Bell-cat states. Our approach markedly suppresses crosstalk errors, outperforming conventional ancilla control methods by orders of magnitude. These results provide guidance for experimentally achieving high-fidelity multimode operations and pave the way for develo** high-performance bosonic quantum information processors. △ Less

Submitted 29 February, 2024; originally announced March 2024.

Comments: 16 pages, 9 figures

Report number: FERMILAB-PUB-24-0033-SQMS

arXiv:2403.00245 [pdf, other]

YOLO-MED : Multi-Task Interaction Network for Biomedical Images

Authors: Suizhi Huang, Shalayiding Sirejiding, Yuxiang Lu, Yue Ding, Leheng Liu, Hui Zhou, Hongtao Lu

Abstract: Object detection and semantic segmentation are pivotal components in biomedical image analysis. Current single-task networks exhibit promising outcomes in both detection and segmentation tasks. Multi-task networks have gained prominence due to their capability to simultaneously tackle segmentation and detection tasks, while also accelerating the segmentation inference. Nevertheless, recent multi-t… ▽ More Object detection and semantic segmentation are pivotal components in biomedical image analysis. Current single-task networks exhibit promising outcomes in both detection and segmentation tasks. Multi-task networks have gained prominence due to their capability to simultaneously tackle segmentation and detection tasks, while also accelerating the segmentation inference. Nevertheless, recent multi-task networks confront distinct limitations such as the difficulty in striking a balance between accuracy and inference speed. Additionally, they often overlook the integration of cross-scale features, which is especially important for biomedical image analysis. In this study, we propose an efficient end-to-end multi-task network capable of concurrently performing object detection and semantic segmentation called YOLO-Med. Our model employs a backbone and a neck for multi-scale feature extraction, complemented by the inclusion of two task-specific decoders. A cross-scale task-interaction module is employed in order to facilitate information fusion between various tasks. Our model exhibits promising results in balancing accuracy and speed when evaluated on the Kvasir-seg dataset and a private biomedical image dataset. △ Less

Submitted 29 February, 2024; originally announced March 2024.

Comments: Accepted by ICASSP 2024

arXiv:2402.19447 [pdf, ps, other]

Weighted Catalan convolution and $(q,2)$-Fock space

Authors: Yungang Lu

Abstract: Motivated by the study of certain combinatorial properties of $(q,2)$-Fock space, we compute explicitly a sequence driven by the Catalan's convolution and parameterized by $1+q$. As an application of this explicit form, we calculate the number of pair partitions involved in the determination of the vacuum--moments of the field operator defined on the $(q,2)$-Fock space. Motivated by the study of certain combinatorial properties of $(q,2)$-Fock space, we compute explicitly a sequence driven by the Catalan's convolution and parameterized by $1+q$. As an application of this explicit form, we calculate the number of pair partitions involved in the determination of the vacuum--moments of the field operator defined on the $(q,2)$-Fock space. △ Less