-
Observation of new charmonium(-like) states in $B^+ \to D^{*\pm} D^{\mp} K^+$ decays
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1062 additional authors not shown)
Abstract:
A study of resonant structures in $B^{+}\rightarrow{D^{\ast+}D^{-}K^{+}}$ and $B^{+}\rightarrow{D^{\ast-}D^{+}K^{+}}$ decays is performed, using proton-proton collision data at centre-of-mass energies of $\sqrt{s}=7, 8$, and $13$ TeV recorded by the LHCb experiment, corresponding to an integrated luminosity of 9 fb$^{-1}$. A simultaneous amplitude fit is performed to the two channels with contribu…
▽ More
A study of resonant structures in $B^{+}\rightarrow{D^{\ast+}D^{-}K^{+}}$ and $B^{+}\rightarrow{D^{\ast-}D^{+}K^{+}}$ decays is performed, using proton-proton collision data at centre-of-mass energies of $\sqrt{s}=7, 8$, and $13$ TeV recorded by the LHCb experiment, corresponding to an integrated luminosity of 9 fb$^{-1}$. A simultaneous amplitude fit is performed to the two channels with contributions from resonances decaying to $D^{\ast-}D^{+}$ and $D^{\ast+}D^{-}$ states linked by $C$ parity. This procedure allows the $C$-parities of resonances in the $D^{\ast\pm}D^{\mp}$ mass spectra to be determined. Four charmonium(-like) states are observed decaying into $D^{\ast\pm}D^{\mp}$: $η_c(3945)$, $h_c(4000)$, $χ_{c1}(4010)$ and $h_c(4300)$, with quantum numbers $J^{PC}$ equal to $0^{-+}$, $1^{+-}$, $1^{++}$ and $1^{+-}$, respectively. At least three of these states have not been observed previously. In addition, the existence of the $T_{\bar{c}\bar{s}0}^{*}(2870)^{0}$ and $T_{\bar{c}\bar{s}1}^{*}(2900)^{0}$ resonances in the $D^-K^+$ mass spectrum, already observed in the $B^+ \to D^+ D^- K^+$ decay, is confirmed in a different production channel.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Computational Limits of Low-Rank Adaptation (LoRA) for Transformer-Based Models
Authors:
Jerry Yao-Chieh Hu,
Maojiang Su,
En-Jui Kuo,
Zhao Song,
Han Liu
Abstract:
We study the computational limits of Low-Rank Adaptation (LoRA) update for finetuning transformer-based models using fine-grained complexity theory. Our key observation is that the existence of low-rank decompositions within the gradient computation of LoRA adaptation leads to possible algorithmic speedup. This allows us to (i) identify a phase transition behavior and (ii) prove the existence of n…
▽ More
We study the computational limits of Low-Rank Adaptation (LoRA) update for finetuning transformer-based models using fine-grained complexity theory. Our key observation is that the existence of low-rank decompositions within the gradient computation of LoRA adaptation leads to possible algorithmic speedup. This allows us to (i) identify a phase transition behavior and (ii) prove the existence of nearly linear algorithms by controlling the LoRA update computation term by term, assuming the Strong Exponential Time Hypothesis (SETH). For the former, we identify a sharp transition in the efficiency of all possible rank-$r$ LoRA update algorithms for transformers, based on specific norms resulting from the multiplications of the input sequence $\mathbf{X}$, pretrained weights $\mathbf{W^\star}$, and adapter matrices $α\mathbf{B} \mathbf{A} / r$. Specifically, we derive a shared upper bound threshold for such norms and show that efficient (sub-quadratic) approximation algorithms of LoRA exist only below this threshold. For the latter, we prove the existence of nearly linear approximation algorithms for LoRA adaptation by utilizing the hierarchical low-rank structures of LoRA gradients and approximating the gradients with a series of chained low-rank approximations. To showcase our theory, we consider two practical scenarios: partial (e.g., only $\mathbf{W}_V$ and $\mathbf{W}_Q$) and full adaptations (e.g., $\mathbf{W}_Q$, $\mathbf{W}_V$, and $\mathbf{W}_K$) of weights in attention heads.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Measurements of the branching fractions of the $P$-wave charmonium spin-singlet state $h_c(^1P_1) \to h^+ h^-π^0/η$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (643 additional authors not shown)
Abstract:
Based on $(2712.4\pm 14.3)\times10^{6}$ $ψ(3686)$ events, we investigate four hadronic decay modes of the $P$-wave charmonium spin-singlet state $h_c(^1P_1) \to h^+ h^- π^0/η$ ($h=π$ or $K$) via the process $ψ(3686) \to π^{0}h_c$ at BESIII. The $h_c \to π^+ π^- π^0$ decay is observed with a significance of 9.6$σ$ after taking into account systematic uncertainties. Evidences for…
▽ More
Based on $(2712.4\pm 14.3)\times10^{6}$ $ψ(3686)$ events, we investigate four hadronic decay modes of the $P$-wave charmonium spin-singlet state $h_c(^1P_1) \to h^+ h^- π^0/η$ ($h=π$ or $K$) via the process $ψ(3686) \to π^{0}h_c$ at BESIII. The $h_c \to π^+ π^- π^0$ decay is observed with a significance of 9.6$σ$ after taking into account systematic uncertainties. Evidences for $h_c \to K^+ K^- π^0$ and $h_c \to K^+ K^- η$ are found with significances of $3.5σ$ and $3.3σ$, respectively, after considering the systematic uncertainties. The branching fractions of these decays are measured to be $\mathcal{B}(h_c \to π^+ π^- π^0)=(1.36\pm0.16\pm0.14)\times10^{-3}$, $\mathcal{B}(h_c \to K^+ K^- π^0)=(3.26\pm0.84\pm0.36)\times10^{-4}$, and $\mathcal{B}(h_c \to K^+ K^- η)=(3.13\pm1.08\pm0.38)\times10^{-4}$, where the first uncertainties are statistical and the second are systematic. No significant signal of $h_c\toπ^+π^-η$ is found, and the upper limit of its decay branching fraction is determined to be $\mathcal{B}(h_c\toπ^+π^-η) < 4.0 \times 10^{-4}$ at 90% confidence level.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Task Arithmetic can Mitigate Synthetic-to-Real Gap in Automatic Speech Recognition
Authors:
Hsuan Su,
Hua Farn,
Fan-Yun Sun,
Shang-Tse Chen,
Hung-yi Lee
Abstract:
Synthetic data is widely used in speech recognition due to the availability of text-to-speech models, which facilitate adapting models to previously unseen text domains. However, existing methods suffer in performance when they fine-tune an automatic speech recognition (ASR) model on synthetic data as they suffer from the distributional shift commonly referred to as the synthetic-to-real gap. In t…
▽ More
Synthetic data is widely used in speech recognition due to the availability of text-to-speech models, which facilitate adapting models to previously unseen text domains. However, existing methods suffer in performance when they fine-tune an automatic speech recognition (ASR) model on synthetic data as they suffer from the distributional shift commonly referred to as the synthetic-to-real gap. In this paper, we find that task vector arithmetic is effective at mitigating this gap. Our proposed method, SYN2REAL task vector, shows an average improvement of 10.03\% improvement in word error rate over baselines on the SLURP dataset. Additionally, we show that an average of SYN2REAL task vectors, when we have real speeches from multiple different domains, can further adapt the original ASR model to perform better on the target text domain.
△ Less
Submitted 15 June, 2024; v1 submitted 5 June, 2024;
originally announced June 2024.
-
Continuous Edge Chromatic Numbers of Abelian Group Actions
Authors:
Su Gao,
Ruijun Wang,
Tianhao Wang
Abstract:
We prove that for any generating set $S$ of $Γ=\mathbb {Z}^n$, the continuous edge chromatic number of the Schreier graph of the Bernoulli shift action $G=F(S,2^Γ)$ is $χ'_c(G)=χ'(G)+1$. In particular, for the standard generating set, the continuous edge chromatic number of $F(2^{\mathbb {Z}^n})$ is $2n+1$.
We prove that for any generating set $S$ of $Γ=\mathbb {Z}^n$, the continuous edge chromatic number of the Schreier graph of the Bernoulli shift action $G=F(S,2^Γ)$ is $χ'_c(G)=χ'(G)+1$. In particular, for the standard generating set, the continuous edge chromatic number of $F(2^{\mathbb {Z}^n})$ is $2n+1$.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
High-resolution Observation of Blowout Jets Regulated by Sunspot Rotation
Authors:
Tingyu Gou,
Rui Liu,
Yang Su,
Astrid M. Veronig,
Hanya Pan,
Runbin Luo,
Weiqun Gan
Abstract:
Coronal jets are believed to be the miniature version of large-scale solar eruptions. In particular, the eruption of a mini-filament inside the base arch is suggested to be the trigger and even driver of blowout jets. Here we propose an alternative triggering mechanism, based on high-resolution H-alpha observations of a blowout jet associated with a mini-filament and an M1.2-class flare. The mini-…
▽ More
Coronal jets are believed to be the miniature version of large-scale solar eruptions. In particular, the eruption of a mini-filament inside the base arch is suggested to be the trigger and even driver of blowout jets. Here we propose an alternative triggering mechanism, based on high-resolution H-alpha observations of a blowout jet associated with a mini-filament and an M1.2-class flare. The mini-filament remains largely stationary during the blowout jet, except that it is straddled by flare loops connecting two flare ribbons, indicating that the magnetic arcade embedding the mini-filament has been torn into two parts, with the upper part esca** with the blowout jet. In the wake of the flare, the southern end of the mini-filament fans out like neighboring fibrils, indicative of mass and field exchanges between the mini-filament and the fibrils. The blowout jet is preceded by a standard jet. With H-alpha fibrils moving toward the single-strand spire in a swee** fashion, the standard jet transitions to the blowout jet. The similar pattern of standard-to-blowout jet transition occurs in an earlier C-class flare before the mini-filament forms. The spiraling morphology and swee** direction of these fibrils are suggestive of their footpoints being dragged by the leading sunspot that undergoes clockwise rotation for over two days. Soon after the sunspot rotation reaches a peak angular speed as fast as 10 deg/hr, the dormant active region becomes flare-productive, and the mini-filament forms through the interaction of moving magnetic features from the rotating sunspot with satellite spots/pores. Hence, we suggest that the sunspot rotation plays a key role in building up free energy for flares and jets and in triggering blowout jets by inducing swee** motions of fibrils.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Latent Style-based Quantum GAN for high-quality Image Generation
Authors:
Su Yeon Chang,
Supanut Thanasilp,
Bertrand Le Saux,
Sofia Vallecorsa,
Michele Grossi
Abstract:
Quantum generative modeling is among the promising candidates for achieving a practical advantage in data analysis. Nevertheless, one key challenge is to generate large-size images comparable to those generated by their classical counterparts. In this work, we take an initial step in this direction and introduce the Latent Style-based Quantum GAN (LaSt-QGAN), which employs a hybrid classical-quant…
▽ More
Quantum generative modeling is among the promising candidates for achieving a practical advantage in data analysis. Nevertheless, one key challenge is to generate large-size images comparable to those generated by their classical counterparts. In this work, we take an initial step in this direction and introduce the Latent Style-based Quantum GAN (LaSt-QGAN), which employs a hybrid classical-quantum approach in training Generative Adversarial Networks (GANs) for arbitrary complex data generation. This novel approach relies on powerful classical auto-encoders to map a high-dimensional original image dataset into a latent representation. The hybrid classical-quantum GAN operates in this latent space to generate an arbitrary number of fake features, which are then passed back to the auto-encoder to reconstruct the original data. Our LaSt-QGAN can be successfully trained on realistic computer vision datasets beyond the standard MNIST, namely Fashion MNIST (fashion products) and SAT4 (Earth Observation images) with 10 qubits, resulting in a comparable performance (and even better in some metrics) with the classical GANs. Moreover, we analyze the barren plateau phenomena within this context of the continuous quantum generative model using a polynomial depth circuit and propose a method to mitigate the detrimental effect during the training of deep-depth networks. Through empirical experiments and theoretical analysis, we demonstrate the potential of LaSt-QGAN for the practical usage in the context of image generation and open the possibility of applying it to a larger dataset in the future.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Retaining Key Information under High Compression Ratios: Query-Guided Compressor for LLMs
Authors:
Zhiwei Cao,
Qian Cao,
Yu Lu,
Ningxin Peng,
Luyang Huang,
Shanbo Cheng,
**song Su
Abstract:
The growing popularity of Large Language Models has sparked interest in context compression for Large Language Models (LLMs). However, the performance of previous methods degrades dramatically as compression ratios increase, sometimes even falling to the closed-book level. This decline can be attributed to the loss of key information during the compression process. Our preliminary study supports t…
▽ More
The growing popularity of Large Language Models has sparked interest in context compression for Large Language Models (LLMs). However, the performance of previous methods degrades dramatically as compression ratios increase, sometimes even falling to the closed-book level. This decline can be attributed to the loss of key information during the compression process. Our preliminary study supports this hypothesis, emphasizing the significance of retaining key information to maintain model performance under high compression ratios. As a result, we introduce Query-Guided Compressor (QGC), which leverages queries to guide the context compression process, effectively preserving key information within the compressed context. Additionally, we employ a dynamic compression strategy. We validate the effectiveness of our proposed QGC on the Question Answering task, including NaturalQuestions, TriviaQA, and HotpotQA datasets. Experimental results show that QGC can consistently perform well even at high compression ratios, which also offers significant benefits in terms of inference cost and throughput.
△ Less
Submitted 17 June, 2024; v1 submitted 4 June, 2024;
originally announced June 2024.
-
Proxy Denoising for Source-Free Domain Adaptation
Authors:
Song Tang,
Wenxin Su,
Mao Ye,
Jianwei Zhang,
Xiatian Zhu
Abstract:
Source-free Domain Adaptation (SFDA) aims to adapt a pre-trained source model to an unlabeled target domain with no access to the source data. Inspired by the success of pre-trained large vision-language (ViL) models in many other applications, the latest SFDA methods have also validated the benefit of ViL models by leveraging their predictions as pseudo supervision. However, we observe that ViL's…
▽ More
Source-free Domain Adaptation (SFDA) aims to adapt a pre-trained source model to an unlabeled target domain with no access to the source data. Inspired by the success of pre-trained large vision-language (ViL) models in many other applications, the latest SFDA methods have also validated the benefit of ViL models by leveraging their predictions as pseudo supervision. However, we observe that ViL's predictions could be noisy and inaccurate at an unknown rate, potentially introducing additional negative effects during adaption. To address this thus-far ignored challenge, in this paper, we introduce a novel Proxy Denoising (ProDe) approach. Specifically, we leverage the ViL model as a proxy to facilitate the adaptation process towards the latent domain-invariant space. Critically, we design a proxy denoising mechanism for correcting ViL's predictions. This is grounded on a novel proxy confidence theory by modeling elegantly the domain adaption effect of the proxy's divergence against the domain-invariant space. To capitalize the corrected proxy, we further derive a mutual knowledge distilling regularization. Extensive experiments show that our ProDe significantly outperforms the current state-of-the-art alternatives under both conventional closed-set setting and the more challenging open-set, partial-set and generalized SFDA settings. The code will release soon.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
DeNVeR: Deformable Neural Vessel Representations for Unsupervised Video Vessel Segmentation
Authors:
Chun-Hung Wu,
Shih-Hong Chen,
Chih-Yao Hu,
Hsin-Yu Wu,
Kai-Hsin Chen,
Yu-You Chen,
Chih-Hai Su,
Chih-Kuo Lee,
Yu-Lun Liu
Abstract:
This paper presents Deformable Neural Vessel Representations (DeNVeR), an unsupervised approach for vessel segmentation in X-ray videos without annotated ground truth. DeNVeR uses optical flow and layer separation, enhancing segmentation accuracy and adaptability through test-time training. A key component of our research is the introduction of the XACV dataset, the first X-ray angiography coronar…
▽ More
This paper presents Deformable Neural Vessel Representations (DeNVeR), an unsupervised approach for vessel segmentation in X-ray videos without annotated ground truth. DeNVeR uses optical flow and layer separation, enhancing segmentation accuracy and adaptability through test-time training. A key component of our research is the introduction of the XACV dataset, the first X-ray angiography coronary video dataset with high-quality, manually labeled segmentation ground truth. Our evaluation demonstrates that DeNVeR outperforms current state-of-the-art methods in vessel segmentation. This paper marks an advance in medical imaging, providing a robust, data-efficient tool for disease diagnosis and treatment planning and setting a new standard for future research in video vessel segmentation. See our project page for video results at https://kirito878.github.io/DeNVeR/.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Learning from Streaming Data when Users Choose
Authors:
**yan Su,
Sarah Dean
Abstract:
In digital markets comprised of many competing services, each user chooses between multiple service providers according to their preferences, and the chosen service makes use of the user data to incrementally improve its model. The service providers' models influence which service the user will choose at the next time step, and the user's choice, in return, influences the model update, leading to…
▽ More
In digital markets comprised of many competing services, each user chooses between multiple service providers according to their preferences, and the chosen service makes use of the user data to incrementally improve its model. The service providers' models influence which service the user will choose at the next time step, and the user's choice, in return, influences the model update, leading to a feedback loop. In this paper, we formalize the above dynamics and develop a simple and efficient decentralized algorithm to locally minimize the overall user loss. Theoretically, we show that our algorithm asymptotically converges to stationary points of of the overall loss almost surely. We also experimentally demonstrate the utility of our algorithm with real world data.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Sparsity-Accelerated Training for Large Language Models
Authors:
Da Ma,
Lu Chen,
Pengyu Wang,
Hongshen Xu,
Hanqi Li,
Liangtai Sun,
Su Zhu,
Shuai Fan,
Kai Yu
Abstract:
Large language models (LLMs) have demonstrated proficiency across various natural language processing (NLP) tasks but often require additional training, such as continual pre-training and supervised fine-tuning. However, the costs associated with this, primarily due to their large parameter count, remain high. This paper proposes leveraging \emph{sparsity} in pre-trained LLMs to expedite this trai…
▽ More
Large language models (LLMs) have demonstrated proficiency across various natural language processing (NLP) tasks but often require additional training, such as continual pre-training and supervised fine-tuning. However, the costs associated with this, primarily due to their large parameter count, remain high. This paper proposes leveraging \emph{sparsity} in pre-trained LLMs to expedite this training process. By observing sparsity in activated neurons during forward iterations, we identify the potential for computational speed-ups by excluding inactive neurons. We address associated challenges by extending existing neuron importance evaluation metrics and introducing a ladder omission rate scheduler. Our experiments on Llama-2 demonstrate that Sparsity-Accelerated Training (SAT) achieves comparable or superior performance to standard training while significantly accelerating the process. Specifically, SAT achieves a $45\%$ throughput improvement in continual pre-training and saves $38\%$ training time in supervised fine-tuning in practice. It offers a simple, hardware-agnostic, and easily deployable framework for additional LLM training. Our code is available at https://github.com/OpenDFM/SAT.
△ Less
Submitted 6 June, 2024; v1 submitted 3 June, 2024;
originally announced June 2024.
-
D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models
Authors:
Haoran Que,
Jiaheng Liu,
Ge Zhang,
Chenchen Zhang,
Xingwei Qu,
Yinghao Ma,
Feiyu Duan,
Zhiqi Bai,
Jiakai Wang,
Yuanxing Zhang,
Xu Tan,
Jie Fu,
Wenbo Su,
Jiamang Wang,
Lin Qu,
Bo Zheng
Abstract:
Continual Pre-Training (CPT) on Large Language Models (LLMs) has been widely used to expand the model's fundamental understanding of specific downstream domains (e.g., math and code). For the CPT on domain-specific LLMs, one important question is how to choose the optimal mixture ratio between the general-corpus (e.g., Dolma, Slim-pajama) and the downstream domain-corpus. Existing methods usually…
▽ More
Continual Pre-Training (CPT) on Large Language Models (LLMs) has been widely used to expand the model's fundamental understanding of specific downstream domains (e.g., math and code). For the CPT on domain-specific LLMs, one important question is how to choose the optimal mixture ratio between the general-corpus (e.g., Dolma, Slim-pajama) and the downstream domain-corpus. Existing methods usually adopt laborious human efforts by grid-searching on a set of mixture ratios, which require high GPU training consumption costs. Besides, we cannot guarantee the selected ratio is optimal for the specific domain. To address the limitations of existing methods, inspired by the Scaling Law for performance prediction, we propose to investigate the Scaling Law of the Domain-specific Continual Pre-Training (D-CPT Law) to decide the optimal mixture ratio with acceptable training costs for LLMs of different sizes. Specifically, by fitting the D-CPT Law, we can easily predict the general and downstream performance of arbitrary mixture ratios, model sizes, and dataset sizes using small-scale training costs on limited experiments. Moreover, we also extend our standard D-CPT Law on cross-domain settings and propose the Cross-Domain D-CPT Law to predict the D-CPT law of target domains, where very small training costs (about 1% of the normal training costs) are needed for the target domains. Comprehensive experimental results on six downstream domains demonstrate the effectiveness and generalizability of our proposed D-CPT Law and Cross-Domain D-CPT Law.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
R2C2-Coder: Enhancing and Benchmarking Real-world Repository-level Code Completion Abilities of Code Large Language Models
Authors:
Ken Deng,
Jiaheng Liu,
He Zhu,
Congnan Liu,
**gxin Li,
Jiakai Wang,
Peng Zhao,
Chenchen Zhang,
Yanan Wu,
Xueqiao Yin,
Yuanxing Zhang,
Wenbo Su,
Bangyu Xiang,
Tiezheng Ge,
Bo Zheng
Abstract:
Code completion models have made significant progress in recent years. Recently, repository-level code completion has drawn more attention in modern software development, and several baseline methods and benchmarks have been proposed. However, existing repository-level code completion methods often fall short of fully using the extensive context of a project repository, such as the intricacies of…
▽ More
Code completion models have made significant progress in recent years. Recently, repository-level code completion has drawn more attention in modern software development, and several baseline methods and benchmarks have been proposed. However, existing repository-level code completion methods often fall short of fully using the extensive context of a project repository, such as the intricacies of relevant files and class hierarchies. Besides, the existing benchmarks usually focus on limited code completion scenarios, which cannot reflect the repository-level code completion abilities well of existing methods. To address these limitations, we propose the R2C2-Coder to enhance and benchmark the real-world repository-level code completion abilities of code Large Language Models, where the R2C2-Coder includes a code prompt construction method R2C2-Enhance and a well-designed benchmark R2C2-Bench. Specifically, first, in R2C2-Enhance, we first construct the candidate retrieval pool and then assemble the completion prompt by retrieving from the retrieval pool for each completion cursor position. Second, based on R2C2 -Enhance, we can construct a more challenging and diverse R2C2-Bench with training, validation and test splits, where a context perturbation strategy is proposed to simulate the real-world repository-level code completion well. Extensive results on multiple benchmarks demonstrate the effectiveness of our R2C2-Coder.
△ Less
Submitted 3 June, 2024; v1 submitted 3 June, 2024;
originally announced June 2024.
-
Measurements of the branching fractions of semileptonic $D^{+}_s$ decays via $e^+e^-\to D_s^{*+}D_s^{*-}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (638 additional authors not shown)
Abstract:
We measure the absolute branching fractions of semileptonic $D^+_s$ decays via the $e^+e^-\to D_s^{*+}D_s^{*-}$ process using $e^+e^-$ collision data corresponding to an integrated luminosity of $10.64~\mathrm{fb}^{-1}$ collected by the BESIII detector at center-of-mass energies between 4.237 and 4.699 GeV. The branching fractions are…
▽ More
We measure the absolute branching fractions of semileptonic $D^+_s$ decays via the $e^+e^-\to D_s^{*+}D_s^{*-}$ process using $e^+e^-$ collision data corresponding to an integrated luminosity of $10.64~\mathrm{fb}^{-1}$ collected by the BESIII detector at center-of-mass energies between 4.237 and 4.699 GeV. The branching fractions are ${\mathcal B}(D_s^+\to ηe^+ν_e)=(2.35\pm0.11_{\rm stat}\pm 0.10_{\rm syst})\%,$ ${\mathcal
B}(D_s^+\to η^\prime e^+ν_e)=(0.82\pm0.09_{\rm stat}\pm 0.04_{\rm syst})\%,$ ${\mathcal B}(D_s^+\to φe^+ν_e)=(2.21\pm0.16_{\rm stat}\pm 0.11_{\rm syst})\%,$ ${\mathcal B}(D_s^+\to f_0(980) e^+ν_e,f_0(980)\toπ^+π^-)=(0.15\pm0.02_{\rm stat}\pm 0.01_{\rm syst})\%,$ ${\mathcal
B}(D_s^+\to K^0 e^+ν_e)=(0.24\pm0.04_{\rm stat}\pm 0.01_{\rm syst})\%,$ and ${\mathcal B}(D_s^+\to K^{*0} e^+ν_e)=(0.19\pm0.03_{\rm stat}\pm 0.01_{\rm syst})\%.$ These results are consistent with those measured via the $e^+e^-\to D_s^{*\pm}D_s^{\mp}$ process by BESIII and CLEO. The hadronic transition form factors $D^+_s\to ηe^+ν_e$, $D^+_s\to η^\prime e^+ν_e$, and $D^+_s\to K^0 e^+ν_e$ at four-momentum transfer squared $q^2$ = 0 are determined to be $f^η_+(0) = 0.482 \pm 0.011_{\rm stat} \pm 0.009_{\rm syst}\pm0.004_{\rm input},$ $f^{η^{\prime}}_+(0) = 0.562 \pm 0.031_{\rm stat} \pm 0.014_{\rm
syst}\pm0.003_{\rm input},$ and $f^{K^0}_+(0) = 0.624 \pm 0.052_{\rm
stat} \pm 0.013_{\rm syst}\pm0.002_{\rm input}.$
△ Less
Submitted 4 June, 2024; v1 submitted 3 June, 2024;
originally announced June 2024.
-
Demonstration Augmentation for Zero-shot In-context Learning
Authors:
Yi Su,
Yunpeng Tai,
Yixin Ji,
Juntao Li,
Bowen Yan,
Min Zhang
Abstract:
Large Language Models (LLMs) have demonstrated an impressive capability known as In-context Learning (ICL), which enables them to acquire knowledge from textual demonstrations without the need for parameter updates. However, many studies have highlighted that the model's performance is sensitive to the choice of demonstrations, presenting a significant challenge for practical applications where we…
▽ More
Large Language Models (LLMs) have demonstrated an impressive capability known as In-context Learning (ICL), which enables them to acquire knowledge from textual demonstrations without the need for parameter updates. However, many studies have highlighted that the model's performance is sensitive to the choice of demonstrations, presenting a significant challenge for practical applications where we lack prior knowledge of user queries. Consequently, we need to construct an extensive demonstration pool and incorporate external databases to assist the model, leading to considerable time and financial costs. In light of this, some recent research has shifted focus towards zero-shot ICL, aiming to reduce the model's reliance on external information by leveraging their inherent generative capabilities. Despite the effectiveness of these approaches, the content generated by the model may be unreliable, and the generation process is time-consuming. To address these issues, we propose Demonstration Augmentation for In-context Learning (DAIL), which employs the model's previously predicted historical samples as demonstrations for subsequent ones. DAIL brings no additional inference cost and does not rely on the model's generative capabilities. Our experiments reveal that DAIL can significantly improve the model's performance over direct zero-shot inference and can even outperform few-shot ICL without any external information.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer
Authors:
Yongxin Zhu,
Dan Su,
Liqiang He,
Linli Xu,
Dong Yu
Abstract:
While recent advancements in speech language models have achieved significant progress, they face remarkable challenges in modeling the long acoustic sequences of neural audio codecs. In this paper, we introduce \textbf{G}enerative \textbf{P}re-trained \textbf{S}peech \textbf{T}ransformer (GPST), a hierarchical transformer designed for efficient speech language modeling. GPST quantizes audio wavef…
▽ More
While recent advancements in speech language models have achieved significant progress, they face remarkable challenges in modeling the long acoustic sequences of neural audio codecs. In this paper, we introduce \textbf{G}enerative \textbf{P}re-trained \textbf{S}peech \textbf{T}ransformer (GPST), a hierarchical transformer designed for efficient speech language modeling. GPST quantizes audio waveforms into two distinct types of discrete speech representations and integrates them within a hierarchical transformer architecture, allowing for a unified one-stage generation process and enhancing Hi-Res audio generation capabilities. By training on large corpora of speeches in an end-to-end unsupervised manner, GPST can generate syntactically consistent speech with diverse speaker identities. Given a brief 3-second prompt, GPST can produce natural and coherent personalized speech, demonstrating in-context learning abilities. Moreover, our approach can be easily extended to spoken cross-lingual speech generation by incorporating multi-lingual semantic tokens and universal acoustic tokens. Experimental results indicate that GPST significantly outperforms the existing speech language models in terms of word error rate, speech quality, and speaker similarity. See \url{https://youngsheen.github.io/GPST/demo} for demo samples.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
A Robust Residual-Based Test for Structural Changes in Factor Models
Authors:
Bin Peng,
Liangjun Su,
Yayi Yan
Abstract:
In this paper, we propose an easy-to-implement residual-based specification testing procedure for detecting structural changes in factor models, which is powerful against both smooth and abrupt structural changes with unknown break dates. The proposed test is robust against the over-specified number of factors, and serially and cross-sectionally correlated error processes. A new central limit theo…
▽ More
In this paper, we propose an easy-to-implement residual-based specification testing procedure for detecting structural changes in factor models, which is powerful against both smooth and abrupt structural changes with unknown break dates. The proposed test is robust against the over-specified number of factors, and serially and cross-sectionally correlated error processes. A new central limit theorem is given for the quadratic forms of panel data with dependence over both dimensions, thereby filling a gap in the literature. We establish the asymptotic properties of the proposed test statistic, and accordingly develop a simulation-based scheme to select critical value in order to improve finite sample performance. Through extensive simulations and a real-world application, we confirm our theoretical results and demonstrate that the proposed test exhibits desirable size and power in practice.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
ViCTORIA project: The LOFAR-MeerKAT view of AGN in Virgo cluster early-type galaxies
Authors:
A. Spasic,
H. W. Edler,
Y. Su,
M. Brüggen,
F. de Gasperin,
T. Pasini,
V. Heesen,
M. Simonte,
A. Boselli,
H. J. A. Röttgering,
M. Fossati
Abstract:
The evolution of Active Galactic Nuclei (AGN) is closely connected to their host galaxies and surroundings. Via feedback processes, AGN can counteract the cooling of the intracluster medium (ICM) and suppress star formation in their host galaxies. Radio observations at low frequencies provide a glimpse into the history of AGN activity. The Virgo cluster is a substantial reservoir of nearby galaxie…
▽ More
The evolution of Active Galactic Nuclei (AGN) is closely connected to their host galaxies and surroundings. Via feedback processes, AGN can counteract the cooling of the intracluster medium (ICM) and suppress star formation in their host galaxies. Radio observations at low frequencies provide a glimpse into the history of AGN activity. The Virgo cluster is a substantial reservoir of nearby galaxies and provides an ideal laboratory for the study of AGN as well as their feedback mechanisms. The aim of our work is to characterise the AGN population within the Virgo cluster down to low radio luminosities, constrain the AGN duty cycle and investigate environmental feedback in cluster member galaxies. We analyse 144 MHz and 1.3 GHz radio observations of early-type galaxies from the ACS Virgo Cluster Survey (ACSVCS) taken with LOFAR and MeerKAT. We detect 12 of these galaxies at 144 MHz, 5 of which show clearly extended radio emission. The radio luminosity shows a strong dependence on the stellar mass of the host galaxy, in agreement with previous results. As a notable outlier, the massive elliptical galaxy NGC 4365 ($M_* = 2.2 \times 10^{11} M_\odot$) is not detected as a compact source in the LOFAR observations. Instead, it is surrounded by diffuse, low-surface brightness emission, which hints towards a past phase of stronger nuclear activity. Furthermore, we find a cavity in NGC 4472 (= M 49) inflated by the wide-angle tail only visible in the LOFAR data, which implies that the cavity was created by a past outburst. The corresponding cavity power is of the same order of magnitude as the jet power in the present duty cycle of the AGN.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
Filament eruption by multiple reconnections
Authors:
Y. Liu,
G. P. Ruan,
B. Schmieder,
J. H. Guo,
Y. Chen,
R. S. Zheng,
J. T. Su,
B. Wang
Abstract:
Filament eruption is a common phenomenon in solar activity, but the triggering mechanism is not well understood. We focus our study on a filament eruption located in a complex nest of three active regions close to a coronal hole. The filament eruption is observed at multiple wavelengths: by the GONG, the STEREO, the SUTRI, and the AIA and Helioseismic and Magnetic Imager (HMI) on board the SDO. Th…
▽ More
Filament eruption is a common phenomenon in solar activity, but the triggering mechanism is not well understood. We focus our study on a filament eruption located in a complex nest of three active regions close to a coronal hole. The filament eruption is observed at multiple wavelengths: by the GONG, the STEREO, the SUTRI, and the AIA and Helioseismic and Magnetic Imager (HMI) on board the SDO. Thanks to high temporal-resolution observations, we were able to analyze the evolution of the fine structure of the filament in detail. The filament changes direction during the eruption, which is followed by a halo coronal mass ejection detected by the LASCO on board the SOHO. A Type III radio burst was also registered at the time of the eruption. To investigate the process of the eruption, we analyzed the magnetic topology of the filament region adopting a nonlinear force-free-field (NLFFF) extrapolation method and the polytropic global magnetohydrodynamic (MHD) modeling. We modeled the filament by embeddingatwisted fluxropewiththe regularized Biot-Savart Laws (RBSL) method in the ambient magnetic f ield. The extrapolation results show that magnetic reconnection occurs in a fan-spine configuration resulting in a circular flare ribbon. The global modeling of the corona demonstrates that there was an interaction between the filament and open field lines, causing a deflection of the filament in the direction of the observed CME eruption and dimming area. The modeling supports the following scenario: magnetic reconnection not only occurs with the filament itself (the flux rope) but also with the background magnetic field lines and open field lines of the coronal hole located to the east of the flux rope. This multiwavelength analysis indicates that the filament undergoes multiple magnetic reconnections on small and large scales with a drifting of the flux rope.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
An AI-Inspired Numerical Method in the Quark Model: Application to Finding the Wave Functions for Heavy Tetraquark States
Authors:
Daeho Park,
Su Houng Lee
Abstract:
The current ongoing advancements in AI have shed light on the landscape of numerical analysis in science. Inspired by the path of achievement of AI, we have developed a method to construct accurate ground state wave functions of multiquark configurations within a quark model. We successfully tested our method through comparisons with meson-type two-body systems with analytic and numerical solution…
▽ More
The current ongoing advancements in AI have shed light on the landscape of numerical analysis in science. Inspired by the path of achievement of AI, we have developed a method to construct accurate ground state wave functions of multiquark configurations within a quark model. We successfully tested our method through comparisons with meson-type two-body systems with analytic and numerical solutions. We then applied our method to find the ground-state solutions of $T_{cc}$($ud\bar{c}\bar{c}$) and $T_{bb}$($ud\bar{b}\bar{b}$) states. Our findings indicate that our approach outperforms existing methods, achieving greater accuracy in reproducing highly intricate configurations. Within the model parameters, we find that the $T_{cc}$ is a compact multiquark configuration.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
Ultrasound Report Generation with Cross-Modality Feature Alignment via Unsupervised Guidance
Authors:
Jun Li,
Tongkun Su,
Baoliang Zhao,
Faqin Lv,
Qiong Wang,
Nassir Navab,
Ying Hu,
Zhongliang Jiang
Abstract:
Automatic report generation has arisen as a significant research area in computer-aided diagnosis, aiming to alleviate the burden on clinicians by generating reports automatically based on medical images. In this work, we propose a novel framework for automatic ultrasound report generation, leveraging a combination of unsupervised and supervised learning methods to aid the report generation proces…
▽ More
Automatic report generation has arisen as a significant research area in computer-aided diagnosis, aiming to alleviate the burden on clinicians by generating reports automatically based on medical images. In this work, we propose a novel framework for automatic ultrasound report generation, leveraging a combination of unsupervised and supervised learning methods to aid the report generation process. Our framework incorporates unsupervised learning methods to extract potential knowledge from ultrasound text reports, serving as the prior information to guide the model in aligning visual and textual features, thereby addressing the challenge of feature discrepancy. Additionally, we design a global semantic comparison mechanism to enhance the performance of generating more comprehensive and accurate medical reports. To enable the implementation of ultrasound report generation, we constructed three large-scale ultrasound image-text datasets from different organs for training and validation purposes. Extensive evaluations with other state-of-the-art approaches exhibit its superior performance across all three datasets. Code and dataset are valuable at this link.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
GLCAN: Global-Local Collaborative Auxiliary Network for Local Learning
Authors:
Feiyu Zhu,
Yuming Zhang,
Changpeng Cai,
Guinan Guo,
Jiao Li,
Xiuyuan Guo,
Quanwei Zhang,
Peizhe Wang,
Chenghao He,
Junhao Su
Abstract:
Traditional deep neural networks typically use end-to-end backpropagation, which often places a big burden on GPU memory. Another promising training method is local learning, which involves splitting the network into blocks and training them in parallel with the help of an auxiliary network. Local learning has been widely studied and applied to image classification tasks, and its performance is co…
▽ More
Traditional deep neural networks typically use end-to-end backpropagation, which often places a big burden on GPU memory. Another promising training method is local learning, which involves splitting the network into blocks and training them in parallel with the help of an auxiliary network. Local learning has been widely studied and applied to image classification tasks, and its performance is comparable to that of end-to-end method. However, different image tasks often rely on different feature representations, which is difficult for typical auxiliary networks to adapt to. To solve this problem, we propose the construction method of Global-Local Collaborative Auxiliary Network (GLCAN), which provides a macroscopic design approach for auxiliary networks. This is the first demonstration that local learning methods can be successfully applied to other tasks such as object detection and super-resolution. GLCAN not only saves a lot of GPU memory, but also has comparable performance to an end-to-end approach on data sets for multiple different tasks.
△ Less
Submitted 1 June, 2024;
originally announced June 2024.
-
DSCA: A Digital Subtraction Angiography Sequence Dataset and Spatio-Temporal Model for Cerebral Artery Segmentation
Authors:
Qihang Xie,
Mengguo Guo,
Lei Mou,
Dan Zhang,
Da Chen,
Caifeng Shan,
Yitian Zhao,
Ruisheng Su,
Jiong Zhang
Abstract:
Cerebrovascular diseases (CVDs) remain a leading cause of global disability and mortality. Digital Subtraction Angiography (DSA) sequences, recognized as the golden standard for diagnosing CVDs, can clearly visualize the dynamic flow and reveal pathological conditions within the cerebrovasculature. Therefore, precise segmentation of cerebral arteries (CAs) and classification between their main tru…
▽ More
Cerebrovascular diseases (CVDs) remain a leading cause of global disability and mortality. Digital Subtraction Angiography (DSA) sequences, recognized as the golden standard for diagnosing CVDs, can clearly visualize the dynamic flow and reveal pathological conditions within the cerebrovasculature. Therefore, precise segmentation of cerebral arteries (CAs) and classification between their main trunks and branches are crucial for physicians to accurately quantify diseases. However, achieving accurate CA segmentation in DSA sequences remains a challenging task due to small vessels with low contrast, and ambiguity between vessels and residual skull structures. Moreover, the lack of publicly available datasets limits exploration in the field. In this paper, we introduce a DSA Sequence-based Cerebral Artery segmentation dataset (DSCA), the first publicly accessible dataset designed specifically for pixel-level semantic segmentation of CAs. Additionally, we propose DSANet, a spatio-temporal network for CA segmentation in DSA sequences. Unlike existing DSA segmentation methods that focus only on a single frame, the proposed DSANet introduces a separate temporal encoding branch to capture dynamic vessel details across multiple frames. To enhance small vessel segmentation and improve vessel connectivity, we design a novel TemporalFormer module to capture global context and correlations among sequential frames. Furthermore, we develop a Spatio-Temporal Fusion (STF) module to effectively integrate spatial and temporal features from the encoder. Extensive experiments demonstrate that DSANet outperforms other state-of-the-art methods in CA segmentation, achieving a Dice of 0.9033.
△ Less
Submitted 1 June, 2024;
originally announced June 2024.
-
Non-destructive Degradation Pattern Decoupling for Ultra-early Battery Prototype Verification Using Physics-informed Machine Learning
Authors:
Shengyu Tao,
Mengtian Zhang,
Zixi Zhao,
Haoyang Li,
Ruifei Ma,
Yunhong Che,
Xin Sun,
Lin Su,
Xiangyu Chen,
Zihao Zhou,
Heng Chang,
Tingwei Cao,
Xiao Xiao,
Yaojun Liu,
Wenjun Yu,
Zhongling Xu,
Yang Li,
Han Hao,
Xuan Zhang,
Xiaosong Hu,
Guangmin ZHou
Abstract:
Manufacturing complexities and uncertainties have impeded the transition from material prototypes to commercial batteries, making prototype verification critical to quality assessment. A fundamental challenge involves deciphering intertwined chemical processes to characterize degradation patterns and their quantitative relationship with battery performance. Here we show that a physics-informed mac…
▽ More
Manufacturing complexities and uncertainties have impeded the transition from material prototypes to commercial batteries, making prototype verification critical to quality assessment. A fundamental challenge involves deciphering intertwined chemical processes to characterize degradation patterns and their quantitative relationship with battery performance. Here we show that a physics-informed machine learning approach can quantify and visualize temporally resolved losses concerning thermodynamics and kinetics only using electric signals. Our method enables non-destructive degradation pattern characterization, expediting temperature-adaptable predictions of entire lifetime trajectories, rather than end-of-life points. The verification speed is 25 times faster yet maintaining 95.1% accuracy across temperatures. Such advances facilitate more sustainable management of defective prototypes before massive production, establishing a 19.76 billion USD scrap material recycling market by 2060 in China. By incorporating stepwise charge acceptance as a measure of the initial manufacturing variability of normally identical batteries, we can immediately identify long-term degradation variations. We attribute the predictive power to interpreting machine learning insights using material-agnostic featurization taxonomy for degradation pattern decoupling. Our findings offer new possibilities for dynamic system analysis, such as battery prototype degradation, demonstrating that complex pattern evolutions can be accurately predicted in a non-destructive and data-driven fashion by integrating physics-informed machine learning.
△ Less
Submitted 31 May, 2024;
originally announced June 2024.
-
Contrastive Learning Via Equivariant Representation
Authors:
Sifan Song,
**feng Wang,
Qiaochu Zhao,
Xiang Li,
Dufan Wu,
Angelos Stefanidis,
Jionglong Su,
S. Kevin Zhou,
Quanzheng Li
Abstract:
Invariant-based Contrastive Learning (ICL) methods have achieved impressive performance across various domains. However, the absence of latent space representation for distortion (augmentation)-related information in the latent space makes ICL sub-optimal regarding training efficiency and robustness in downstream tasks. Recent studies suggest that introducing equivariance into Contrastive Learning…
▽ More
Invariant-based Contrastive Learning (ICL) methods have achieved impressive performance across various domains. However, the absence of latent space representation for distortion (augmentation)-related information in the latent space makes ICL sub-optimal regarding training efficiency and robustness in downstream tasks. Recent studies suggest that introducing equivariance into Contrastive Learning (CL) can improve overall performance. In this paper, we rethink the roles of augmentation strategies and equivariance in improving CL efficacy. We propose a novel Equivariant-based Contrastive Learning (ECL) framework, CLeVER (Contrastive Learning Via Equivariant Representation), compatible with augmentation strategies of arbitrary complexity for various mainstream CL methods and model frameworks. Experimental results demonstrate that CLeVER effectively extracts and incorporates equivariant information from data, thereby improving the training efficiency and robustness of baseline models in downstream tasks.
△ Less
Submitted 31 May, 2024;
originally announced June 2024.
-
Multi-Modal and Multi-Agent Systems Meet Rationality: A Survey
Authors:
Bowen Jiang,
Yangxinyu Xie,
Xiaomeng Wang,
Weijie J. Su,
Camillo J. Taylor,
Tanwi Mallick
Abstract:
Rationality is the quality of being guided by reason, characterized by logical thinking and decision-making that align with evidence and logical rules. This quality is essential for effective problem-solving, as it ensures that solutions are well-founded and systematically derived. Despite the advancements of large language models (LLMs) in generating human-like text with remarkable accuracy, they…
▽ More
Rationality is the quality of being guided by reason, characterized by logical thinking and decision-making that align with evidence and logical rules. This quality is essential for effective problem-solving, as it ensures that solutions are well-founded and systematically derived. Despite the advancements of large language models (LLMs) in generating human-like text with remarkable accuracy, they present biases inherited from the training data, inconsistency across different contexts, and difficulty understanding complex scenarios involving multiple layers of context. Therefore, recent research attempts to leverage the strength of multiple agents working collaboratively with various types of data and tools for enhanced consistency and reliability. To that end, this paper aims to understand whether multi-modal and multi-agent systems are advancing toward rationality by surveying the state-of-the-art works, identifying advancements over single-agent and single-modal systems in terms of rationality, and discussing open problems and future directions. We maintain an open repository at https://github.com/bowen-upenn/MMMA_Rationality.
△ Less
Submitted 18 June, 2024; v1 submitted 31 May, 2024;
originally announced June 2024.
-
Amplitude analysis of the radiative decay $B^0_s\to K^+K^-γ$
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1061 additional authors not shown)
Abstract:
A search for radiative decay of $B^0_s$ mesons to orbitally excited $K^+K^-$ states is performed using proton proton collisions recorded by the \mbox{LHCb}\xspace experiment, corresponding to an integrated luminosity of 9~fb$^{-1}$. The dikaon spectrum in the mass range $m_{KK}<2400$~{\ensuremath{\,\text{Me\kern -0.1em V\!/}c^2}\xspace} is dominated by the $φ(1020)$ resonance that accounts for alm…
▽ More
A search for radiative decay of $B^0_s$ mesons to orbitally excited $K^+K^-$ states is performed using proton proton collisions recorded by the \mbox{LHCb}\xspace experiment, corresponding to an integrated luminosity of 9~fb$^{-1}$. The dikaon spectrum in the mass range $m_{KK}<2400$~{\ensuremath{\,\text{Me\kern -0.1em V\!/}c^2}\xspace} is dominated by the $φ(1020)$ resonance that accounts for almost 70$\%$ of the decay rate. Considering the possible contributions of $f_2{(1270)}$, $f'_2{(1525)}$ and $f_2{(2010)}$ meson states, the overall tensor contribution to the amplitude is measured to be \begin{equation}
{\cal F}_{\{f_2\}}=16.8\pm 0.5\mathrm{~(stat.)}\pm0.7\mathrm{~(syst.)}\%,\nonumber \end{equation}
mostly dominated by the $f'_2(1525)$ state. Several statistically equivalent solutions are obtained for the detailed resonant structure depending on whether the smaller amplitudes interfere destructively or constructively with the dominant amplitude. The preferred solution that corresponds to the lowest values of the fit fractions along with constructive interference leads to the relative branching ratio measurement \begin{equation}
\frac{{\cal B}(B^0_s\to f'_2γ)}{{\cal B}(B^0_s\toφγ)}= 19.4^{+0.9}_{-0.8}\mathrm{~(stat.)}{}^{+1.4}_{-0.5}\mathrm{~(syst.)}\pm0.5\mathrm{~(\cal{B})}\%\nonumber, \end{equation} where the last uncertainty is due to the ratio of measured branching fractions to the $K^+K^-$ final state. This result represents the first observation of the radiative $B^0_s\to f'_2(1525)γ$ decay, which is the second radiative transition observed in the $B^0_s$ sector.
△ Less
Submitted 31 May, 2024;
originally announced June 2024.
-
Progress in patterned wax stamp for prototy** of paper-based microfluidic analytical devices via injection molding
Authors:
Zhizhi Zhou,
Jiahuan Jiang,
Yuanyuan Sun,
Qing Qin,
Sitong Yuan,
Xilin Wang,
Jianhua Jiang,
Yifeng Su,
Xing Hu,
Mingying Liu,
Feng Yang
Abstract:
In this study, we successfully developed two-dimensional paper-based analytical devices using a hybrid technique of injection molding and embossing. This innovative approach involves passive or active delivery of molten wax onto a glass substrate through a sealed chip, facilitating wax stamp creation.
In this study, we successfully developed two-dimensional paper-based analytical devices using a hybrid technique of injection molding and embossing. This innovative approach involves passive or active delivery of molten wax onto a glass substrate through a sealed chip, facilitating wax stamp creation.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
Context-aware Difference Distilling for Multi-change Captioning
Authors:
Yunbin Tu,
Liang Li,
Li Su,
Zheng-Jun Zha,
Chenggang Yan,
Qingming Huang
Abstract:
Multi-change captioning aims to describe complex and coupled changes within an image pair in natural language. Compared with single-change captioning, this task requires the model to have higher-level cognition ability to reason an arbitrary number of changes. In this paper, we propose a novel context-aware difference distilling (CARD) network to capture all genuine changes for yielding sentences.…
▽ More
Multi-change captioning aims to describe complex and coupled changes within an image pair in natural language. Compared with single-change captioning, this task requires the model to have higher-level cognition ability to reason an arbitrary number of changes. In this paper, we propose a novel context-aware difference distilling (CARD) network to capture all genuine changes for yielding sentences. Given an image pair, CARD first decouples context features that aggregate all similar/dissimilar semantics, termed common/difference context features. Then, the consistency and independence constraints are designed to guarantee the alignment/discrepancy of common/difference context features. Further, the common context features guide the model to mine locally unchanged features, which are subtracted from the pair to distill locally difference features. Next, the difference context features augment the locally difference features to ensure that all changes are distilled. In this way, we obtain an omni-representation of all changes, which is translated into linguistic sentences by a transformer decoder. Extensive experiments on three public datasets show CARD performs favourably against state-of-the-art methods.The code is available at https://github.com/tuyunbin/CARD.
△ Less
Submitted 7 June, 2024; v1 submitted 31 May, 2024;
originally announced May 2024.
-
GANcrop: A Contrastive Defense Against Backdoor Attacks in Federated Learning
Authors:
Xiaoyun Gan,
Shanyu Gan,
Taizhi Su,
Peng Liu
Abstract:
With heightened awareness of data privacy protection, Federated Learning (FL) has attracted widespread attention as a privacy-preserving distributed machine learning method. However, the distributed nature of federated learning also provides opportunities for backdoor attacks, where attackers can guide the model to produce incorrect predictions without affecting the global model training process.…
▽ More
With heightened awareness of data privacy protection, Federated Learning (FL) has attracted widespread attention as a privacy-preserving distributed machine learning method. However, the distributed nature of federated learning also provides opportunities for backdoor attacks, where attackers can guide the model to produce incorrect predictions without affecting the global model training process.
This paper introduces a novel defense mechanism against backdoor attacks in federated learning, named GANcrop. This approach leverages contrastive learning to deeply explore the disparities between malicious and benign models for attack identification, followed by the utilization of Generative Adversarial Networks (GAN) to recover backdoor triggers and implement targeted mitigation strategies. Experimental findings demonstrate that GANcrop effectively safeguards against backdoor attacks, particularly in non-IID scenarios, while maintaining satisfactory model accuracy, showcasing its remarkable defensive efficacy and practical utility.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
Search for $e^{+}e^{-}\toη'ψ(2S)$ at center-of-mass energies from 4.66 to 4.95 GeV
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (638 additional authors not shown)
Abstract:
Using data samples with an integrated luminosity of $4.67~\mathrm{fb}^{-1}$ collected by the BESIII detector operating at the BEPCII collider, we search for the process $e^+e^- \rightarrow η' ψ(2S)$ at center-of-mass energies from $4.66$ to $4.95~\mathrm{GeV}$. No significant signal is observed, and upper limits for the Born cross sections $σ^B(e^+e^-\rightarrowη'ψ(2S))$ at the 90\% confidence lev…
▽ More
Using data samples with an integrated luminosity of $4.67~\mathrm{fb}^{-1}$ collected by the BESIII detector operating at the BEPCII collider, we search for the process $e^+e^- \rightarrow η' ψ(2S)$ at center-of-mass energies from $4.66$ to $4.95~\mathrm{GeV}$. No significant signal is observed, and upper limits for the Born cross sections $σ^B(e^+e^-\rightarrowη'ψ(2S))$ at the 90\% confidence level are determined.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
Study of the decays $χ_{cJ} \rightarrow Λ\barΛφ$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (637 additional authors not shown)
Abstract:
Based on $(2712.4 \pm 14.3) \times 10^{6}$ $ e^{+}e^{-}\toψ(3686)$ events collected with the BESIII detector operating at the BEPCII collider, we report the first evidence of $χ_{c0}\to Λ\bar Λφ$ decays and the first observation of $χ_{c1,2}\to Λ\bar Λφ$ decays, with significances of $4.5σ$, $11.3σ$ and $13.0σ$, respectively. The decay branching fractions of $χ_{c0,1,2}\to Λ\bar Λφ$ are measured t…
▽ More
Based on $(2712.4 \pm 14.3) \times 10^{6}$ $ e^{+}e^{-}\toψ(3686)$ events collected with the BESIII detector operating at the BEPCII collider, we report the first evidence of $χ_{c0}\to Λ\bar Λφ$ decays and the first observation of $χ_{c1,2}\to Λ\bar Λφ$ decays, with significances of $4.5σ$, $11.3σ$ and $13.0σ$, respectively. The decay branching fractions of $χ_{c0,1,2}\to Λ\bar Λφ$ are measured to be $( 2.99\pm1.24\pm0.19) \times 10^{-5}$, $(6.01\pm0.90\pm0.40 )\times 10^{-5}$, and $(7.13\pm0.81\pm0.36) \times 10^{-5}$, where the first uncertainties are statistical and the second systematic. No obvious enhancement near the $Λ\barΛ$ production threshold or excited $Λ$ state is found in the $Λφ$ (or $\barΛφ$) system.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
Using Large Language Models for Humanitarian Frontline Negotiation: Opportunities and Considerations
Authors:
Zilin Ma,
Susannah,
Su,
Nathan Zhao,
Linn Bieske,
Blake Bullwinkel,
Yanyi Zhang,
Sophia,
Yang,
Ziqing Luo,
Siyao Li,
Gekai Liao,
Boxiang Wang,
**glun Gao,
Zihan Wen,
Claude Bruderlein,
Weiwei Pan
Abstract:
Humanitarian negotiations in conflict zones, called \emph{frontline negotiation}, are often highly adversarial, complex, and high-risk. Several best-practices have emerged over the years that help negotiators extract insights from large datasets to navigate nuanced and rapidly evolving scenarios. Recent advances in large language models (LLMs) have sparked interest in the potential for AI to aid d…
▽ More
Humanitarian negotiations in conflict zones, called \emph{frontline negotiation}, are often highly adversarial, complex, and high-risk. Several best-practices have emerged over the years that help negotiators extract insights from large datasets to navigate nuanced and rapidly evolving scenarios. Recent advances in large language models (LLMs) have sparked interest in the potential for AI to aid decision making in frontline negotiation. Through in-depth interviews with 13 experienced frontline negotiators, we identified their needs for AI-assisted case analysis and creativity support, as well as concerns surrounding confidentiality and model bias. We further explored the potential for AI augmentation of three standard tools used in frontline negotiation planning. We evaluated the quality and stability of our ChatGPT-based negotiation tools in the context of two real cases. Our findings highlight the potential for LLMs to enhance humanitarian negotiations and underscore the need for careful ethical and practical considerations.
△ Less
Submitted 30 May, 2024; v1 submitted 30 May, 2024;
originally announced May 2024.
-
A Staged Approach using Machine Learning and Uncertainty Quantification to Predict the Risk of Hip Fracture
Authors:
Anjum Shaik,
Kristoffer Larsen,
Nancy E. Lane,
Chen Zhao,
Kuan-Jui Su,
Joyce H. Keyak,
Qing Tian,
Qiuying Sha,
Hui Shen,
Hong-Wen Deng,
Weihua Zhou
Abstract:
Despite advancements in medical care, hip fractures impose a significant burden on individuals and healthcare systems. This paper focuses on the prediction of hip fracture risk in older and middle-aged adults, where falls and compromised bone quality are predominant factors. We propose a novel staged model that combines advanced imaging and clinical data to improve predictive performance. By using…
▽ More
Despite advancements in medical care, hip fractures impose a significant burden on individuals and healthcare systems. This paper focuses on the prediction of hip fracture risk in older and middle-aged adults, where falls and compromised bone quality are predominant factors. We propose a novel staged model that combines advanced imaging and clinical data to improve predictive performance. By using CNNs to extract features from hip DXA images, along with clinical variables, shape measurements, and texture features, our method provides a comprehensive framework for assessing fracture risk. A staged machine learning-based model was developed using two ensemble models: Ensemble 1 (clinical variables only) and Ensemble 2 (clinical variables and DXA imaging features). This staged approach used uncertainty quantification from Ensemble 1 to decide if DXA features are necessary for further prediction. Ensemble 2 exhibited the highest performance, achieving an AUC of 0.9541, an accuracy of 0.9195, a sensitivity of 0.8078, and a specificity of 0.9427. The staged model also performed well, with an AUC of 0.8486, an accuracy of 0.8611, a sensitivity of 0.5578, and a specificity of 0.9249, outperforming Ensemble 1, which had an AUC of 0.5549, an accuracy of 0.7239, a sensitivity of 0.1956, and a specificity of 0.8343. Furthermore, the staged model suggested that 54.49% of patients did not require DXA scanning. It effectively balanced accuracy and specificity, offering a robust solution when DXA data acquisition is not always feasible. Statistical tests confirmed significant differences between the models, highlighting the advantages of the advanced modeling strategies. Our staged approach could identify individuals at risk with a high accuracy but reduce the unnecessary DXA scanning. It has great promise to guide interventions to prevent hip fractures with reduced cost and radiation.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Fourier Controller Networks for Real-Time Decision-Making in Embodied Learning
Authors:
Hengkai Tan,
Songming Liu,
Kai Ma,
Chengyang Ying,
Xingxing Zhang,
Hang Su,
Jun Zhu
Abstract:
Transformer has shown promise in reinforcement learning to model time-varying features for obtaining generalized low-level robot policies on diverse robotics datasets in embodied learning. However, it still suffers from the issues of low data efficiency and high inference latency. In this paper, we propose to investigate the task from a new perspective of the frequency domain. We first observe tha…
▽ More
Transformer has shown promise in reinforcement learning to model time-varying features for obtaining generalized low-level robot policies on diverse robotics datasets in embodied learning. However, it still suffers from the issues of low data efficiency and high inference latency. In this paper, we propose to investigate the task from a new perspective of the frequency domain. We first observe that the energy density in the frequency domain of a robot's trajectory is mainly concentrated in the low-frequency part. Then, we present the Fourier Controller Network (FCNet), a new network that uses Short-Time Fourier Transform (STFT) to extract and encode time-varying features through frequency domain interpolation. In order to do real-time decision-making, we further adopt FFT and Sliding DFT methods in the model architecture to achieve parallel training and efficient recurrent inference. Extensive results in both simulated (e.g., D4RL) and real-world environments (e.g., robot locomotion) demonstrate FCNet's substantial efficiency and effectiveness over existing methods such as Transformer, e.g., FCNet outperforms Transformer on multi-environmental robotics datasets of all types of sizes (from 1.9M to 120M). The project page and code can be found https://thkkk.github.io/fcnet.
△ Less
Submitted 5 June, 2024; v1 submitted 30 May, 2024;
originally announced May 2024.
-
SLAM-based Joint Calibration of Multiple Asynchronous Microphone Arrays and Sound Source Localization
Authors:
Jiang Wang,
Yuanzheng He,
Daobilige Su,
Katsutoshi Itoyama,
Kazuhiro Nakadai,
Junfeng Wu,
Shoudong Huang,
Youfu Li,
He Kong
Abstract:
Robot audition systems with multiple microphone arrays have many applications in practice. However, accurate calibration of multiple microphone arrays remains challenging because there are many unknown parameters to be identified, including the relative transforms (i.e., orientation, translation) and asynchronous factors (i.e., initial time offset and sampling clock difference) between microphone…
▽ More
Robot audition systems with multiple microphone arrays have many applications in practice. However, accurate calibration of multiple microphone arrays remains challenging because there are many unknown parameters to be identified, including the relative transforms (i.e., orientation, translation) and asynchronous factors (i.e., initial time offset and sampling clock difference) between microphone arrays. To tackle these challenges, in this paper, we adopt batch simultaneous localization and map** (SLAM) for joint calibration of multiple asynchronous microphone arrays and sound source localization. Using the Fisher information matrix (FIM) approach, we first conduct the observability analysis (i.e., parameter identifiability) of the above-mentioned calibration problem and establish necessary/sufficient conditions under which the FIM and the Jacobian matrix have full column rank, which implies the identifiability of the unknown parameters. We also discover several scenarios where the unknown parameters are not uniquely identifiable. Subsequently, we propose an effective framework to initialize the unknown parameters, which is used as the initial guess in batch SLAM for multiple microphone arrays calibration, aiming to further enhance optimization accuracy and convergence. Extensive numerical simulations and real experiments have been conducted to verify the performance of the proposed method. The experiment results show that the proposed pipeline achieves higher accuracy with fast convergence in comparison to methods that use the noise-corrupted ground truth of the unknown parameters as the initial guess in the optimization and other existing frameworks.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Exploring the Robustness of Decision-Level Through Adversarial Attacks on LLM-Based Embodied Models
Authors:
Shuyuan Liu,
Jiawei Chen,
Shouwei Ruan,
Hang Su,
Zhaoxia Yin
Abstract:
Embodied intelligence empowers agents with a profound sense of perception, enabling them to respond in a manner closely aligned with real-world situations. Large Language Models (LLMs) delve into language instructions with depth, serving a crucial role in generating plans for intricate tasks. Thus, LLM-based embodied models further enhance the agent's capacity to comprehend and process information…
▽ More
Embodied intelligence empowers agents with a profound sense of perception, enabling them to respond in a manner closely aligned with real-world situations. Large Language Models (LLMs) delve into language instructions with depth, serving a crucial role in generating plans for intricate tasks. Thus, LLM-based embodied models further enhance the agent's capacity to comprehend and process information. However, this amalgamation also ushers in new challenges in the pursuit of heightened intelligence. Specifically, attackers can manipulate LLMs to produce irrelevant or even malicious outputs by altering their prompts. Confronted with this challenge, we observe a notable absence of multi-modal datasets essential for comprehensively evaluating the robustness of LLM-based embodied models. Consequently, we construct the Embodied Intelligent Robot Attack Dataset (EIRAD), tailored specifically for robustness evaluation. Additionally, two attack strategies are devised, including untargeted attacks and targeted attacks, to effectively simulate a range of diverse attack scenarios. At the same time, during the attack process, to more accurately ascertain whether our method is successful in attacking the LLM-based embodied model, we devise a new attack success evaluation method utilizing the BLIP2 model. Recognizing the time and cost-intensive nature of the GCG algorithm in attacks, we devise a scheme for prompt suffix initialization based on various target tasks, thus expediting the convergence process. Experimental results demonstrate that our method exhibits a superior attack success rate when targeting LLM-based embodied models, indicating a lower level of decision-level robustness in these models.
△ Less
Submitted 3 June, 2024; v1 submitted 30 May, 2024;
originally announced May 2024.
-
Trembling Motion of Exciton-Polaritons Close to the Rashba-Dresselhaus Regime
Authors:
Wen Wen,
Jie Liang,
Huawen Xu,
Feng **,
Yuri G. Rubo,
Timothy C. H. Liew,
Rui Su
Abstract:
We report the experimental emulation of trembling quantum motion, or Zitterbewegung, of exciton polaritons in a perovskite microcavity at room temperature. By introducing liquid crystal molecules into the microcavity, we achieve spinor states with synthetic Rashba-Dresselhaus spin-orbit coupling and tunable energy splitting. Under a resonant excitation, the polariton fluid exhibits clear trembling…
▽ More
We report the experimental emulation of trembling quantum motion, or Zitterbewegung, of exciton polaritons in a perovskite microcavity at room temperature. By introducing liquid crystal molecules into the microcavity, we achieve spinor states with synthetic Rashba-Dresselhaus spin-orbit coupling and tunable energy splitting. Under a resonant excitation, the polariton fluid exhibits clear trembling motion perpendicular to its flowing direction, accompanied by a unique spin pattern resembling interlocked fingers. Furthermore, leveraging on the sizable tunability of energy gaps by external electrical voltages, we observe the continuous transition of polariton Zitterbewegung from relativistic (small gaps) to non-relativistic (large gaps) regimes. Our findings pave the way for using exciton polaritons in the emulation of relativistic quantum physics.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Estimating before Debiasing: A Bayesian Approach to Detaching Prior Bias in Federated Semi-Supervised Learning
Authors:
Guogang Zhu,
Xuefeng Liu,
Xinghao Wu,
Shaojie Tang,
Chao Tang,
Jianwei Niu,
Hao Su
Abstract:
Federated Semi-Supervised Learning (FSSL) leverages both labeled and unlabeled data on clients to collaboratively train a model.In FSSL, the heterogeneous data can introduce prediction bias into the model, causing the model's prediction to skew towards some certain classes. Existing FSSL methods primarily tackle this issue by enhancing consistency in model parameters or outputs. However, as the mo…
▽ More
Federated Semi-Supervised Learning (FSSL) leverages both labeled and unlabeled data on clients to collaboratively train a model.In FSSL, the heterogeneous data can introduce prediction bias into the model, causing the model's prediction to skew towards some certain classes. Existing FSSL methods primarily tackle this issue by enhancing consistency in model parameters or outputs. However, as the models themselves are biased, merely constraining their consistency is not sufficient to alleviate prediction bias. In this paper, we explore this bias from a Bayesian perspective and demonstrate that it principally originates from label prior bias within the training data. Building upon this insight, we propose a debiasing method for FSSL named FedDB. FedDB utilizes the Average Prediction Probability of Unlabeled Data (APP-U) to approximate the biased prior.During local training, FedDB employs APP-U to refine pseudo-labeling through Bayes' theorem, thereby significantly reducing the label prior bias. Concurrently, during the model aggregation, FedDB uses APP-U from participating clients to formulate unbiased aggregate weights, thereby effectively diminishing bias in the global model. Experimental results show that FedDB can surpass existing FSSL methods. The code is available at https://github.com/GuogangZhu/FedDB.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Search for the decay $B^{0}\toγγ$ using Belle and Belle II data
Authors:
Belle,
Belle II Collaborations,
:,
I. Adachi,
L. Aggarwal,
H. Aihara,
N. Akopov,
A. Aloisio,
S. Al Said,
N. Althubiti,
N. Anh Ky,
D. M. Asner,
H. Atmacan,
T. Aushev,
V. Aushev,
M. Aversano,
R. Ayad,
V. Babu,
H. Bae,
S. Bahinipati,
P. Bambade,
Sw. Banerjee,
S. Bansal,
M. Barrett,
J. Baudot
, et al. (385 additional authors not shown)
Abstract:
We report the result of a search for the rare decay $B^{0} \to γγ$ using a combined dataset of $753\times10^{6}$ $B\bar{B}$ pairs collected by the Belle experiment and $387\times10^{6}$ $B\bar{B}$ pairs collected by the Belle II experiment from decays of the $\rm Υ(4S)$ resonance produced in $e^{+}e^{-}$ collisions. A simultaneous fit to the Belle and Belle II data sets yields…
▽ More
We report the result of a search for the rare decay $B^{0} \to γγ$ using a combined dataset of $753\times10^{6}$ $B\bar{B}$ pairs collected by the Belle experiment and $387\times10^{6}$ $B\bar{B}$ pairs collected by the Belle II experiment from decays of the $\rm Υ(4S)$ resonance produced in $e^{+}e^{-}$ collisions. A simultaneous fit to the Belle and Belle II data sets yields $11.0^{+6.5}_{-5.5}$ signal events, corresponding to a 2.5$σ$ significance. We determine the branching fraction $\mathcal{B}(B^{0} \to γγ) = (3.7^{+2.2}_{-1.8}(\rm stat)\pm0.5(\rm syst))\times10^{-8}$ and set a 90% credibility level upper limit of $\mathcal{B}(B^{0} \to γγ) < 6.4\times10^{-8}$.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
AutoBreach: Universal and Adaptive Jailbreaking with Efficient Wordplay-Guided Optimization
Authors:
Jiawei Chen,
Xiao Yang,
Zhengwei Fang,
Yu Tian,
Yinpeng Dong,
Zhaoxia Yin,
Hang Su
Abstract:
Despite the widespread application of large language models (LLMs) across various tasks, recent studies indicate that they are susceptible to jailbreak attacks, which can render their defense mechanisms ineffective. However, previous jailbreak research has frequently been constrained by limited universality, suboptimal efficiency, and a reliance on manual crafting. In response, we rethink the appr…
▽ More
Despite the widespread application of large language models (LLMs) across various tasks, recent studies indicate that they are susceptible to jailbreak attacks, which can render their defense mechanisms ineffective. However, previous jailbreak research has frequently been constrained by limited universality, suboptimal efficiency, and a reliance on manual crafting. In response, we rethink the approach to jailbreaking LLMs and formally define three essential properties from the attacker' s perspective, which contributes to guiding the design of jailbreak methods. We further introduce AutoBreach, a novel method for jailbreaking LLMs that requires only black-box access. Inspired by the versatility of wordplay, AutoBreach employs a wordplay-guided map** rule sampling strategy to generate a variety of universal map** rules for creating adversarial prompts. This generation process leverages LLMs' automatic summarization and reasoning capabilities, thus alleviating the manual burden. To boost jailbreak success rates, we further suggest sentence compression and chain-of-thought-based map** rules to correct errors and wordplay misinterpretations in target LLMs. Additionally, we propose a two-stage map** rule optimization strategy that initially optimizes map** rules before querying target LLMs to enhance the efficiency of AutoBreach. AutoBreach can efficiently identify security vulnerabilities across various LLMs, including three proprietary models: Claude-3, GPT-3.5, GPT-4 Turbo, and two LLMs' web platforms: Bingchat, GPT-4 Web, achieving an average success rate of over 80% with fewer than 10 queries
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
A novel fault localization with data refinement for hydroelectric units
Authors:
Jialong Huang,
Junlin Song,
Penglong Lian,
Mengjie Gan,
Zhiheng Su,
Benhao Wang,
Wenji Zhu,
Xiaomin Pu,
Jianxiao Zou,
Shicai Fan
Abstract:
Due to the scarcity of fault samples and the complexity of non-linear and non-smooth characteristics data in hydroelectric units, most of the traditional hydroelectric unit fault localization methods are difficult to carry out accurate localization. To address these problems, a sparse autoencoder (SAE)-generative adversarial network (GAN)-wavelet noise reduction (WNR)- manifold-boosted deep learni…
▽ More
Due to the scarcity of fault samples and the complexity of non-linear and non-smooth characteristics data in hydroelectric units, most of the traditional hydroelectric unit fault localization methods are difficult to carry out accurate localization. To address these problems, a sparse autoencoder (SAE)-generative adversarial network (GAN)-wavelet noise reduction (WNR)- manifold-boosted deep learning (SG-WMBDL) based fault localization method for hydroelectric units is proposed. To overcome the data scarcity, a SAE is embedded into the GAN to generate more high-quality samples in the data generation module. Considering the signals involving non-linear and non-smooth characteristics, the improved WNR which combining both soft and hard thresholding and local linear embedding (LLE) are utilized to the data preprocessing module in order to reduce the noise and effectively capture the local features. In addition, to seek higher performance, the novel Adaptive Boost (AdaBoost) combined with multi deep learning is proposed to achieve accurate fault localization. The experimental results show that the SG-WMBDL can locate faults for hydroelectric units under a small number of fault samples with non-linear and non-smooth characteristics on higher precision and accuracy compared to other frontier methods, which verifies the effectiveness and practicality of the proposed method.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Unlocking the Power of Spatial and Temporal Information in Medical Multimodal Pre-training
Authors:
**xia Yang,
Bing Su,
Wayne Xin Zhao,
Ji-Rong Wen
Abstract:
Medical vision-language pre-training methods mainly leverage the correspondence between paired medical images and radiological reports. Although multi-view spatial images and temporal sequences of image-report pairs are available in off-the-shelf multi-modal medical datasets, most existing methods have not thoroughly tapped into such extensive supervision signals. In this paper, we introduce the M…
▽ More
Medical vision-language pre-training methods mainly leverage the correspondence between paired medical images and radiological reports. Although multi-view spatial images and temporal sequences of image-report pairs are available in off-the-shelf multi-modal medical datasets, most existing methods have not thoroughly tapped into such extensive supervision signals. In this paper, we introduce the Med-ST framework for fine-grained spatial and temporal modeling to exploit information from multiple spatial views of chest radiographs and temporal historical records. For spatial modeling, Med-ST employs the Mixture of View Expert (MoVE) architecture to integrate different visual features from both frontal and lateral views. To achieve a more comprehensive alignment, Med-ST not only establishes the global alignment between whole images and texts but also introduces modality-weighted local alignment between text tokens and spatial regions of images. For temporal modeling, we propose a novel cross-modal bidirectional cycle consistency objective by forward map** classification (FMC) and reverse map** regression (RMR). By perceiving temporal information from simple to complex, Med-ST can learn temporal semantics. Experimental results across four distinct tasks demonstrate the effectiveness of Med-ST, especially in temporal classification tasks. Our code and model are available at https://github.com/SVT-Yang/MedST.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Few-shot fault diagnosis based on multi-scale graph convolution filtering for industry
Authors:
Mengjie Gan,
Penglong Lian,
Zhiheng Su,
Jiyang Zhang,
Jialong Huang,
Benhao Wang,
Jianxiao Zou,
Shicai Fan
Abstract:
Industrial equipment fault diagnosis often encounter challenges such as the scarcity of fault data, complex operating conditions, and varied types of failures. Signal analysis, data statistical learning, and conventional deep learning techniques face constraints under these conditions due to their substantial data requirements and the necessity for transfer learning to accommodate new failure mode…
▽ More
Industrial equipment fault diagnosis often encounter challenges such as the scarcity of fault data, complex operating conditions, and varied types of failures. Signal analysis, data statistical learning, and conventional deep learning techniques face constraints under these conditions due to their substantial data requirements and the necessity for transfer learning to accommodate new failure modes. To effectively leverage information and extract the intrinsic characteristics of faults across different domains under limited sample conditions, this paper introduces a fault diagnosis approach employing Multi-Scale Graph Convolution Filtering (MSGCF). MSGCF enhances the traditional Graph Neural Network (GNN) framework by integrating both local and global information fusion modules within the graph convolution filter block. This advancement effectively mitigates the over-smoothing issue associated with excessive layering of graph convolutional layers while preserving a broad receptive field. It also reduces the risk of overfitting in few-shot diagnosis, thereby augmenting the model's representational capacity. Experiments on the University of Paderborn bearing dataset (PU) demonstrate that the MSGCF method proposed herein surpasses alternative approaches in accuracy, thereby offering valuable insights for industrial fault diagnosis in few-shot learning scenarios.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Source Code Foundation Models are Transferable Binary Analysis Knowledge Bases
Authors:
Zian Su,
Xiangzhe Xu,
Ziyang Huang,
Kaiyuan Zhang,
Xiangyu Zhang
Abstract:
Human-Oriented Binary Reverse Engineering (HOBRE) lies at the intersection of binary and source code, aiming to lift binary code to human-readable content relevant to source code, thereby bridging the binary-source semantic gap. Recent advancements in uni-modal code model pre-training, particularly in generative Source Code Foundation Models (SCFMs) and binary understanding models, have laid the g…
▽ More
Human-Oriented Binary Reverse Engineering (HOBRE) lies at the intersection of binary and source code, aiming to lift binary code to human-readable content relevant to source code, thereby bridging the binary-source semantic gap. Recent advancements in uni-modal code model pre-training, particularly in generative Source Code Foundation Models (SCFMs) and binary understanding models, have laid the groundwork for transfer learning applicable to HOBRE. However, existing approaches for HOBRE rely heavily on uni-modal models like SCFMs for supervised fine-tuning or general LLMs for prompting, resulting in sub-optimal performance. Inspired by recent progress in large multi-modal models, we propose that it is possible to harness the strengths of uni-modal code models from both sides to bridge the semantic gap effectively. In this paper, we introduce a novel probe-and-recover framework that incorporates a binary-source encoder-decoder model and black-box LLMs for binary analysis. Our approach leverages the pre-trained knowledge within SCFMs to synthesize relevant, symbol-rich code fragments as context. This additional context enables black-box LLMs to enhance recovery accuracy. We demonstrate significant improvements in zero-shot binary summarization and binary function name recovery, with a 10.3% relative gain in CHRF and a 16.7% relative gain in a GPT4-based metric for summarization, as well as a 6.7% and 7.4% absolute increase in token-level precision and recall for name recovery, respectively. These results highlight the effectiveness of our approach in automating and improving binary code analysis.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Quantum error detection with noise-resilient parity-controlled gate in two-dimensional Rydberg atom arrays
Authors:
F. Q. Guo,
S. L. Su,
Weibin Li,
X. Q. Shao
Abstract:
Quantum error detection relies primarily on precise measurement of qubit parity, a fundamental operation in quantum information processing. Here, we introduce a resilient parity-controlled gate tailored for detecting quantum errors within a 2D Rydberg atom array. Our method enables the discrimination between even and odd parities of virtually excited control atoms by tracking the dynamic evolution…
▽ More
Quantum error detection relies primarily on precise measurement of qubit parity, a fundamental operation in quantum information processing. Here, we introduce a resilient parity-controlled gate tailored for detecting quantum errors within a 2D Rydberg atom array. Our method enables the discrimination between even and odd parities of virtually excited control atoms by tracking the dynamic evolution of an auxiliary atom. Using spin-exchange dipolar interactions of Rydberg states and single- and two-photon driving between ground states and Rydberg states, our method speeds up Rydberg-parity measurements by a large amount compared to previous methods. In practical application, we explore three-qubit repetition codes, standard surface codes featuring stabilizers in the forms $ZZZZ$ and $XXXX$, as well as rotated surface codes in the $XZZX$ configuration, facilitating the measurement of stabilizers with a single-shot readout. We carry out thorough numerical simulations to evaluate the feasibility of our strategy, considering potential experimental imperfections such as undesired interactions between Rydberg states, fluctuations in atomic positions, dephasing noise, and laser amplitude inhomogeneities. Particular emphasis is placed on ensuring the reliability and advantages of the physical mechanisms of the parity meter. These results affirm the robustness and viability of our protocol, positioning it as a promising candidate for quantum error detection employing the Rydberg atom system in the foreseeable future.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
AI Risk Management Should Incorporate Both Safety and Security
Authors:
Xiangyu Qi,
Yangsibo Huang,
Yi Zeng,
Edoardo Debenedetti,
Jonas Gei**,
Luxi He,
Kaixuan Huang,
Udari Madhushani,
Vikash Sehwag,
Weijia Shi,
Boyi Wei,
Tinghao Xie,
Danqi Chen,
Pin-Yu Chen,
Jeffrey Ding,
Ruoxi Jia,
Jiaqi Ma,
Arvind Narayanan,
Weijie J Su,
Mengdi Wang,
Chaowei Xiao,
Bo Li,
Dawn Song,
Peter Henderson,
Prateek Mittal
Abstract:
The exposure of security vulnerabilities in safety-aligned language models, e.g., susceptibility to adversarial attacks, has shed light on the intricate interplay between AI safety and AI security. Although the two disciplines now come together under the overarching goal of AI risk management, they have historically evolved separately, giving rise to differing perspectives. Therefore, in this pape…
▽ More
The exposure of security vulnerabilities in safety-aligned language models, e.g., susceptibility to adversarial attacks, has shed light on the intricate interplay between AI safety and AI security. Although the two disciplines now come together under the overarching goal of AI risk management, they have historically evolved separately, giving rise to differing perspectives. Therefore, in this paper, we advocate that stakeholders in AI risk management should be aware of the nuances, synergies, and interplay between safety and security, and unambiguously take into account the perspectives of both disciplines in order to devise mostly effective and holistic risk mitigation approaches. Unfortunately, this vision is often obfuscated, as the definitions of the basic concepts of "safety" and "security" themselves are often inconsistent and lack consensus across communities. With AI risk management being increasingly cross-disciplinary, this issue is particularly salient. In light of this conceptual challenge, we introduce a unified reference framework to clarify the differences and interplay between AI safety and AI security, aiming to facilitate a shared understanding and effective collaboration across communities.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
The Pulsar Science Collaboratory: Multi-Epoch Scintillation Studies of Pulsars
Authors:
Jacob E. Turner,
Juan G. Lebron Medina,
Zachary Zelensky,
Kathleen A. Gustavso,
Jeffrey Marx,
Manvith Kothapalli,
Luis D. Cruz Vega,
Alexander Lee,
Caryelis B. Figueroa,
Daniel E. Reichart,
Joshua B. Haislip,
Vladimir V. Kouprianov,
Steve White,
Frank Ghigo,
Sue Ann Heatherly,
Maura A. McLaughlin
Abstract:
We report on findings from scintillation analyses using high-cadence observations of nine canonical pulsars with observing baselines ranging from one to three years. We obtain scintillation bandwidth and timescale measurements for all pulsars in our survey and obtain scintillation arc curvature measurements for four pulsars, detecting multiple arcs for two of them. Using updated pulsar distance es…
▽ More
We report on findings from scintillation analyses using high-cadence observations of nine canonical pulsars with observing baselines ranging from one to three years. We obtain scintillation bandwidth and timescale measurements for all pulsars in our survey and obtain scintillation arc curvature measurements for four pulsars, detecting multiple arcs for two of them. Using updated pulsar distance estimates, we find evidence of previously undocumented scattering screens along the line of sight (LOS) of PSRs J1645$-$0317 and J2022$+$5154, as well as evidence that one of the arcs along the LOS to PSR J2313$+$4253 may reside somewhere within the Orion-Cygnus arm of the Milky Way. By augmenting the results of previous studies, we find general agreement with estimations of scattering delays from pulsar observations and those predicted by the NE2001 electron density model. In a similar manner, we find additional evidence of a correlation between a pulsar's dispersion measure and the overall variability of its scattering delays over time. The plethora of interesting science obtained through these observations demonstrates the capabilities of the Green Bank Observatory's 20m telescope to contribute to pulsar-based studies of the interstellar medium.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
A Multi-Source Retrieval Question Answering Framework Based on RAG
Authors:
Ridong Wu,
Shuhong Chen,
Xiangbiao Su,
Yuankai Zhu,
Yifei Liao,
Jianming Wu
Abstract:
With the rapid development of large-scale language models, Retrieval-Augmented Generation (RAG) has been widely adopted. However, existing RAG paradigms are inevitably influenced by erroneous retrieval information, thereby reducing the reliability and correctness of generated results. Therefore, to improve the relevance of retrieval information, this study proposes a method that replaces tradition…
▽ More
With the rapid development of large-scale language models, Retrieval-Augmented Generation (RAG) has been widely adopted. However, existing RAG paradigms are inevitably influenced by erroneous retrieval information, thereby reducing the reliability and correctness of generated results. Therefore, to improve the relevance of retrieval information, this study proposes a method that replaces traditional retrievers with GPT-3.5, leveraging its vast corpus knowledge to generate retrieval information. We also propose a web retrieval based method to implement fine-grained knowledge retrieval, Utilizing the powerful reasoning capability of GPT-3.5 to realize semantic partitioning of problem.In order to mitigate the illusion of GPT retrieval and reduce noise in Web retrieval,we proposes a multi-source retrieval framework, named MSRAG, which combines GPT retrieval with web retrieval. Experiments on multiple knowledge-intensive QA datasets demonstrate that the proposed framework in this study performs better than existing RAG framework in enhancing the overall efficiency and accuracy of QA systems.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.