-
Measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (639 additional authors not shown)
Abstract:
A high precision measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$ is performed using $(10 087 \pm 44) \times 10^6$ $J/ψ$ events recorded by the {BESIII} detector at the {BEPCII} storage ring. The branching fractions of the two decays $J/ψ\to p \bar{p} η(η\to γγ)$ and $J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)$ are measured individually to be…
▽ More
A high precision measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$ is performed using $(10 087 \pm 44) \times 10^6$ $J/ψ$ events recorded by the {BESIII} detector at the {BEPCII} storage ring. The branching fractions of the two decays $J/ψ\to p \bar{p} η(η\to γγ)$ and $J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)$ are measured individually to be $\mathcal{B}(J/ψ\to p \bar{p} η(η\to γγ)) = (1.480 \pm 0.001 \pm 0.024)\times\,10^{-3}$ and $\mathcal{B}(J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)) = (1.557 \pm 0.003 \pm 0.038)\times\,10^{-3}$, where the first uncertainties are statistical and the second systematic. Both results are compatible within their uncorrelated systematic uncertainties. The combined result is $\mathcal{B}(J/ψ\to p \bar{p} η)=(1.495 \pm 0.001 \pm 0.023)\times\,10^{-3}$ where the first uncertainty is the combined statistical uncertainty and the second one the combined systematic uncertainty of both analyses, incorporating correlations between them. In addition, the $p \bar{p}$ threshold region is investigated for a potential threshold enhancement, and no evidence for one is observed.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Breaking Language Barriers: Cross-Lingual Continual Pre-Training at Scale
Authors:
Wenzhen Zheng,
Wenbo Pan,
Xu Xu,
Libo Qin,
Li Yue,
Ming Zhou
Abstract:
In recent years, Large Language Models (LLMs) have made significant strides towards Artificial General Intelligence. However, training these models from scratch requires substantial computational resources and vast amounts of text data. In this paper, we explore an alternative approach to constructing an LLM for a new language by continually pretraining (CPT) from existing pretrained LLMs, instead…
▽ More
In recent years, Large Language Models (LLMs) have made significant strides towards Artificial General Intelligence. However, training these models from scratch requires substantial computational resources and vast amounts of text data. In this paper, we explore an alternative approach to constructing an LLM for a new language by continually pretraining (CPT) from existing pretrained LLMs, instead of using randomly initialized parameters. Based on parallel experiments on 40 model sizes ranging from 40M to 5B parameters, we find that 1) CPT converges faster and saves significant resources in a scalable manner; 2) CPT adheres to an extended scaling law derived from Hoffmann et al. (2022) with a joint data-parameter scaling term; 3) The compute-optimal data-parameter allocation for CPT markedly differs based on our estimated scaling factors; 4) The effectiveness of transfer at scale is influenced by training duration and linguistic properties, while robust to data replaying, a method that effectively mitigates catastrophic forgetting in CPT. We hope our findings provide deeper insights into the transferability of LLMs at scale for the research community.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
EXCGEC: A Benchmark of Edit-wise Explainable Chinese Grammatical Error Correction
Authors:
**gheng Ye,
Shang Qin,
Yinghui Li,
Xuxin Cheng,
Libo Qin,
Hai-Tao Zheng,
Peng Xing,
Zishan Xu,
Guo Cheng,
Zhao Wei
Abstract:
Existing studies explore the explainability of Grammatical Error Correction (GEC) in a limited scenario, where they ignore the interaction between corrections and explanations. To bridge the gap, this paper introduces the task of EXplainable GEC (EXGEC), which focuses on the integral role of both correction and explanation tasks. To facilitate the task, we propose EXCGEC, a tailored benchmark for…
▽ More
Existing studies explore the explainability of Grammatical Error Correction (GEC) in a limited scenario, where they ignore the interaction between corrections and explanations. To bridge the gap, this paper introduces the task of EXplainable GEC (EXGEC), which focuses on the integral role of both correction and explanation tasks. To facilitate the task, we propose EXCGEC, a tailored benchmark for Chinese EXGEC consisting of 8,216 explanation-augmented samples featuring the design of hybrid edit-wise explanations. We benchmark several series of LLMs in multiple settings, covering post-explaining and pre-explaining. To promote the development of the task, we introduce a comprehensive suite of automatic metrics and conduct human evaluation experiments to demonstrate the human consistency of the automatic metrics for free-text explanations. All the codes and data will be released after the review.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
Observation of the Electromagnetic Dalitz Transition $h_c \rightarrow e^+e^-η_c$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
S. Ahmed,
M. Albrecht,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
X. H. Bai,
Y. Bai,
O. Bakina,
R. Baldini Ferroli,
I. Balossino,
Y. Ban,
K. Begzsuren,
N. Berger,
M. Bertani,
D. Bettoni,
F. Bianchi,
J. Bloms,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (495 additional authors not shown)
Abstract:
Using $(27.12\pm 0.14)\times10^8$ $ψ(3686)$ decays and data samples of $e^+e^-$ collisions with $\sqrt{s}$ from 4.130 to 4.780~GeV collected with the BESIII detector, we report the first observation of the electromagnetic Dalitz transition $h_c\to e^+e^-η_c$ with a statistical significance of $5.4σ$. We measure the ratio of the branching fractions…
▽ More
Using $(27.12\pm 0.14)\times10^8$ $ψ(3686)$ decays and data samples of $e^+e^-$ collisions with $\sqrt{s}$ from 4.130 to 4.780~GeV collected with the BESIII detector, we report the first observation of the electromagnetic Dalitz transition $h_c\to e^+e^-η_c$ with a statistical significance of $5.4σ$. We measure the ratio of the branching fractions $\frac{\mathcal{B}(h_c\rightarrow e^+e^-η_c)}{\mathcal{B}(h_c\rightarrow γη_c)}$ separately for the $h_c$ samples produced via $ψ(3686)\toπ^0h_c$ and $e^+e^-\toπ^+π^-h_c$. The average ratio is determined to be $(0.59\pm0.10(\text{stat.})\pm0.04(\text{syst.}))\%$, where the uncertainty includes both statistical and systematic components.
△ Less
Submitted 2 July, 2024; v1 submitted 28 June, 2024;
originally announced July 2024.
-
Improved measurement of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (643 additional authors not shown)
Abstract:
Analyzing $e^+e^-$ collision data corresponding to an integrated luminosity of $7.33~\mathrm{fb}^{-1}$ collected at center-of-mass energies between 4.128 and 4.226~GeV with the BESIII detector, we measure the branching fraction of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$ to be $(2.98\pm0.23\pm0.12)\times10^{-3}$. The $D_s^+\to K^0$ hadronic form factor is determined from the differential dec…
▽ More
Analyzing $e^+e^-$ collision data corresponding to an integrated luminosity of $7.33~\mathrm{fb}^{-1}$ collected at center-of-mass energies between 4.128 and 4.226~GeV with the BESIII detector, we measure the branching fraction of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$ to be $(2.98\pm0.23\pm0.12)\times10^{-3}$. The $D_s^+\to K^0$ hadronic form factor is determined from the differential decay rate of $D^+_s\to K^0 e^+ν_e$ to be $f^{K^0}_+(0)=0.636\pm0.049\pm0.013$. For both measurements, the first uncertainty is statistical and the second systematic. The branching fraction and form factor measurements are factors of 1.6 and 1.7 more precise than the previous world averages, respectively.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Measurement of the cross sections of $e^+e^-\to K^{-}\barΞ^{+}Λ/Σ^{0}$ at center-of-mass energies between 3.510 and 4.914 GeV
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (638 additional authors not shown)
Abstract:
Using $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at center-of-mass energies between 3.510 and 4.914GeV, corresponding to an integrated luminosity of 25 fb$^{-1}$, we measure the Born cross sections for the process $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$ at thirty-five energy points with a partial-reconstruction strategy. By fitting the dressed cross sections of…
▽ More
Using $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at center-of-mass energies between 3.510 and 4.914GeV, corresponding to an integrated luminosity of 25 fb$^{-1}$, we measure the Born cross sections for the process $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$ at thirty-five energy points with a partial-reconstruction strategy. By fitting the dressed cross sections of $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$, evidence for $ψ(4160) \to K^{-}\barΞ^{+}Λ$ is found for the first time with a significance of 4.4$σ$, including systematic uncertainties. No evidence for other possible resonances is found. In addition, the products of electronic partial width and branching fraction for all assumed resonances decaying into $K^{-}\barΞ^{+}Λ/Σ^{0}$ are determined.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Measurements of $K_S^0$-$K_L^0$ asymmetries in the decays $Λ_c^+ \to pK_{L,S}^0$, $pK_{L,S}^0π^+π^-$ and $pK_{L,S}^0π^0$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (643 additional authors not shown)
Abstract:
Using $e^+e^-$ annihilation data sets corresponding to an integrated luminosity of 4.5 $\text{fb}^{-1}$, collected with the BESIII detector at center-of-mass energies between 4.600 and 4.699 GeV, we report the first measurements of the absolute branching fractions $\mathcal{B}(Λ_c^+\to pK_{L}^{0})=(1.67 \pm 0.06 \pm 0. 04)\%$, $\mathcal{B}(Λ_c^+\to pK_{L}^{0}π^+π^-)=(1.69 \pm 0.10 \pm 0.05)\%$, an…
▽ More
Using $e^+e^-$ annihilation data sets corresponding to an integrated luminosity of 4.5 $\text{fb}^{-1}$, collected with the BESIII detector at center-of-mass energies between 4.600 and 4.699 GeV, we report the first measurements of the absolute branching fractions $\mathcal{B}(Λ_c^+\to pK_{L}^{0})=(1.67 \pm 0.06 \pm 0. 04)\%$, $\mathcal{B}(Λ_c^+\to pK_{L}^{0}π^+π^-)=(1.69 \pm 0.10 \pm 0.05)\%$, and $\mathcal{B}(Λ_c^+\to pK_{L}^{0}π^0)=(2.02 \pm 0.13 \pm 0.05)\%$, where the first uncertainties are statistical and the second systematic. Combining with the known branching fractions of $Λ_c^+ \to pK_{S}^{0}$, $Λ_c^+ \to pK_{S}^{0}π^+π^-$, and $Λ_c^+ \to pK_{S}^{0}π^0$, we present the first measurements of the $K_{S}^{0}$-$K_{L}^{0}$ asymmetries $R(Λ_c^+, K_{S,L}^0X) = \frac{\mathcal{B}(Λ_c^+ \to K_{S}^{0} X) - \mathcal{B}(Λ_c^+ \to K_{L}^{0} X)}{\mathcal{B}(Λ_c^+ \to K_{S}^{0} X) + \mathcal{B}(Λ_c^+ \to K_{L}^{0} X)}$ in charmed baryon decays: $R(Λ_c^+, pK_{S,L}^0) = -0.025 \pm 0.031$, $R(Λ_c^+, pK_{S,L}^0π^+π^-) = -0.027 \pm 0.048$, and $R(Λ_c^+, pK_{S,L}^0π^0) =-0.015 \pm 0.046$. No significant asymmetries within the uncertainties are observed.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Study of the $f_{0}(980)$ through the decay $D_{s}^{+}\rightarrow π^{+}π^{+}π^{-}π^{0}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (649 additional authors not shown)
Abstract:
We perform the first amplitude analysis of $D^+_s \to π^+π^+π^-π^0$ decays, based on data samples of electron-positron collisions recorded with the BESIII detector at center-of-mass energies between 4.128 and 4.226 GeV, corresponding to an integrated luminosity of 7.33~fb$^{-1}$. We report the observation of $D_{s}^{+} \to f_0(980)ρ(770)^{+}$ with a statistical significance greater than 10$σ$ and…
▽ More
We perform the first amplitude analysis of $D^+_s \to π^+π^+π^-π^0$ decays, based on data samples of electron-positron collisions recorded with the BESIII detector at center-of-mass energies between 4.128 and 4.226 GeV, corresponding to an integrated luminosity of 7.33~fb$^{-1}$. We report the observation of $D_{s}^{+} \to f_0(980)ρ(770)^{+}$ with a statistical significance greater than 10$σ$ and determine the branching fractions $\mathcal{B}(D_s^+\toπ^+π^+π^-π^0|_{{\rm non}-η})=(2.04\pm0.08_{\rm stat.}\pm0.05_{\rm syst.})\%$ and $\mathcal{B}(D_s^+\toηπ^+)=(1.56\pm0.09_{\rm stat.}\pm0.04_{\rm syst.})\%$. Moreover, we measure the relative branching fraction between $φ\toπ^+π^-π^0$ and $φ\to K^+K^-$ to be $\frac{\mathcal{B}(φ(1020) \to π^+π^-π^0)}{\mathcal{B}(φ(1020) \to K^+K^-)}=0.230 \pm 0.014_{\rm stat.} \pm 0.010_{\rm syst.}$, which deviates from the world average value by more than $4σ$.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Self-Constructed Context Decompilation with Fined-grained Alignment Enhancement
Authors:
Yunlong Feng,
Yang Xu,
Dechuan Teng,
Honglin Mu,
Xiao Xu,
Libo Qin,
Wanxiang Che,
Qingfu Zhu
Abstract:
Decompilation transforms compiled code back into a high-level programming language for analysis when source code is unavailable. Previous work has primarily focused on enhancing decompilation performance by increasing the scale of model parameters or training data for pre-training. Based on the characteristics of the decompilation task, we propose two methods: (1) Without fine-tuning, the Self-Con…
▽ More
Decompilation transforms compiled code back into a high-level programming language for analysis when source code is unavailable. Previous work has primarily focused on enhancing decompilation performance by increasing the scale of model parameters or training data for pre-training. Based on the characteristics of the decompilation task, we propose two methods: (1) Without fine-tuning, the Self-Constructed Context Decompilation (sc$^2$dec) method recompiles the LLM's decompilation results to construct pairs for in-context learning, hel** the model improve decompilation performance. (2) Fine-grained Alignment Enhancement (FAE), which meticulously aligns assembly code with source code at the statement level by leveraging debugging information, is employed during the fine-tuning phase to achieve further improvements in decompilation. By integrating these two methods, we achieved a Re-Executability performance improvement of approximately 7.35\% on the Decompile-Eval benchmark, establishing a new state-of-the-art performance of 55.03\%.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
EVALALIGN: Supervised Fine-Tuning Multimodal LLMs with Human-Aligned Data for Evaluating Text-to-Image Models
Authors:
Zhiyu Tan,
Xiaomeng Yang,
Luozheng Qin,
Meng** Yang,
Cheng Zhang,
Hao Li
Abstract:
The recent advancements in text-to-image generative models have been remarkable. Yet, the field suffers from a lack of evaluation metrics that accurately reflect the performance of these models, particularly lacking fine-grained metrics that can guide the optimization of the models. In this paper, we propose EvalAlign, a metric characterized by its accuracy, stability, and fine granularity. Our ap…
▽ More
The recent advancements in text-to-image generative models have been remarkable. Yet, the field suffers from a lack of evaluation metrics that accurately reflect the performance of these models, particularly lacking fine-grained metrics that can guide the optimization of the models. In this paper, we propose EvalAlign, a metric characterized by its accuracy, stability, and fine granularity. Our approach leverages the capabilities of Multimodal Large Language Models (MLLMs) pre-trained on extensive datasets. We develop evaluation protocols that focus on two key dimensions: image faithfulness and text-image alignment. Each protocol comprises a set of detailed, fine-grained instructions linked to specific scoring options, enabling precise manual scoring of the generated images. We Supervised Fine-Tune (SFT) the MLLM to align closely with human evaluative judgments, resulting in a robust evaluation model. Our comprehensive tests across 24 text-to-image generation models demonstrate that EvalAlign not only provides superior metric stability but also aligns more closely with human preferences than existing metrics, confirming its effectiveness and utility in model assessment.
△ Less
Submitted 26 June, 2024; v1 submitted 24 June, 2024;
originally announced June 2024.
-
Efficient Antagonistic k-plex Enumeration in Signed Graphs
Authors:
Lantian Xu,
Rong-Hua Li,
Dong Wen,
Qiangqiang Dai,
Guoren Wang,
Lu Qin
Abstract:
A signed graph is a graph where each edge receives a sign, positive or negative. The signed graph model has been used in many real applications, such as protein complex discovery and social network analysis. Finding cohesive subgraphs in signed graphs is a fundamental problem. A k-plex is a common model for cohesive subgraphs in which every vertex is adjacent to all but at most k vertices within t…
▽ More
A signed graph is a graph where each edge receives a sign, positive or negative. The signed graph model has been used in many real applications, such as protein complex discovery and social network analysis. Finding cohesive subgraphs in signed graphs is a fundamental problem. A k-plex is a common model for cohesive subgraphs in which every vertex is adjacent to all but at most k vertices within the subgraph. In this paper, we propose the model of size-constrained antagonistic k-plex in a signed graph. The proposed model guarantees that the resulting subgraph is a k-plex and can be divided into two sub-k-plexes, both of which have positive inner edges and negative outer edges. This paper aims to identify all maximal antagonistic k-plexes in a signed graph. Through rigorous analysis, we show that the problem is NP-Hardness. We propose a novel framework for maximal antagonistic k-plexes utilizing set enumeration. Efficiency is improved through pivot pruning and early termination based on the color bound. Preprocessing techniques based on degree and dichromatic graphs effectively narrow the search space before enumeration. Extensive experiments on real-world datasets demonstrate our algorithm's efficiency, effectiveness, and scalability.
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
A class of overdetermined problem for fractional Capacity
Authors:
Lei Qin,
Lu Zhang
Abstract:
In this paper, we consider an unconventional overdetermined problem through a property of concavity, which provides some characterizations of balls via Brunn-Minkowski inequalities. In this setting, our rsults can be viewed as the generalization of $p$-capacity in [14], which have its own interest.
In this paper, we consider an unconventional overdetermined problem through a property of concavity, which provides some characterizations of balls via Brunn-Minkowski inequalities. In this setting, our rsults can be viewed as the generalization of $p$-capacity in [14], which have its own interest.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
Search for the $e^+e^- \to φχ_{c1}(3872)$ process at BESIII
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (639 additional authors not shown)
Abstract:
Based on 368.5 pb$^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies 4.914 and 4.946 GeV by the BESIII detector, the $e^+e^- \to φχ_{c1}(3872)$ process is searched for the first time. No significant signal is observed and the upper limits at the 90\% confidence level on the product of the Born cross section $σ(e^+e^- \to φχ_{c1}(3872))$ and the branching fraction…
▽ More
Based on 368.5 pb$^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies 4.914 and 4.946 GeV by the BESIII detector, the $e^+e^- \to φχ_{c1}(3872)$ process is searched for the first time. No significant signal is observed and the upper limits at the 90\% confidence level on the product of the Born cross section $σ(e^+e^- \to φχ_{c1}(3872))$ and the branching fraction $\mathcal{B}[χ_{c1}(3872)\toπ^+π^- J/ψ]$ at 4.914 and 4.946 GeV are set to be 0.85 and 0.96 pb, respectively. These measurements provide useful information for the production of the $χ_{c1}(3872)$ at $e^+e^-$ collider and deepen our understanding about the nature of this particle.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
AutoCAP: Towards Automatic Cross-lingual Alignment Planning for Zero-shot Chain-of-Thought
Authors:
Yongheng Zhang,
Qiguang Chen,
Min Li,
Wanxiang Che,
Libo Qin
Abstract:
Cross-lingual chain-of-thought can effectively complete reasoning tasks across languages, which gains increasing attention. Recently, dominant approaches in the literature improve cross-lingual alignment capabilities by integrating reasoning knowledge from different languages. Despite achieving excellent performance, current methods still have two main challenges: (1) Manual language specification…
▽ More
Cross-lingual chain-of-thought can effectively complete reasoning tasks across languages, which gains increasing attention. Recently, dominant approaches in the literature improve cross-lingual alignment capabilities by integrating reasoning knowledge from different languages. Despite achieving excellent performance, current methods still have two main challenges: (1) Manual language specification: They still highly rely on manually selecting the languages to integrate, severely affecting their generalizability; (2) Static weight allocation: Current methods simply integrate all languages equally. In fact, different language reasoning paths should have different weights to achieve better complementation and integration. Motivated by this, we introduce an Automatic Cross-lingual Alignment Planning (AutoCAP) for zero-shot chain-of-thought to address the above challenges. The core of AutoCAP consists of two components: (1) Automatic Language Selection Prompting to guide LLMs to select appropriate languages and (2) Automatic Weight Allocation Prompting to automatically allocate alignment weight scores to each reasoning path. Extensive experiments on several benchmarks reveal that AutoCAP achieves state-of-the-art performance, surpassing previous methods that required manual effort.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
WeatherQA: Can Multimodal Language Models Reason about Severe Weather?
Authors:
Chengqian Ma,
Zhanxiang Hua,
Alexandra Anderson-Frey,
Vikram Iyer,
Xin Liu,
Lianhui Qin
Abstract:
Severe convective weather events, such as hail, tornadoes, and thunderstorms, often occur quickly yet cause significant damage, costing billions of dollars every year. This highlights the importance of forecasting severe weather threats hours in advance to better prepare meteorologists and residents in at-risk areas. Can modern large foundation models perform such forecasting? Existing weather ben…
▽ More
Severe convective weather events, such as hail, tornadoes, and thunderstorms, often occur quickly yet cause significant damage, costing billions of dollars every year. This highlights the importance of forecasting severe weather threats hours in advance to better prepare meteorologists and residents in at-risk areas. Can modern large foundation models perform such forecasting? Existing weather benchmarks typically focus only on predicting time-series changes in certain weather parameters (e.g., temperature, moisture) with text-only features. In this work, we introduce WeatherQA, the first multimodal dataset designed for machines to reason about complex combinations of weather parameters (a.k.a., ingredients) and predict severe weather in real-world scenarios. The dataset includes over 8,000 (multi-images, text) pairs for diverse severe weather events. Each pair contains rich information crucial for forecasting -- the images describe the ingredients capturing environmental instability, surface observations, and radar reflectivity, and the text contains forecast analyses written by human experts. With WeatherQA, we evaluate state-of-the-art vision language models, including GPT4, Claude3.5, Gemini-1.5, and a fine-tuned Llama3-based VLM, by designing two challenging tasks: (1) multi-choice QA for predicting affected area and (2) classification of the development potential of severe convection. These tasks require deep understanding of domain knowledge (e.g., atmospheric dynamics) and complex reasoning over multimodal data (e.g., interactions between weather parameters). We show a substantial gap between the strongest VLM, GPT4o, and human reasoning. Our comprehensive case study with meteorologists further reveals the weaknesses of the models, suggesting that better training and data integration are necessary to bridge this gap. WeatherQA link: https://github.com/chengqianma/WeatherQA.
△ Less
Submitted 23 June, 2024; v1 submitted 17 June, 2024;
originally announced June 2024.
-
Knowledge Distillation in Federated Learning: a Survey on Long Lasting Challenges and New Solutions
Authors:
Laiqiao Qin,
Tianqing Zhu,
Wanlei Zhou,
Philip S. Yu
Abstract:
Federated Learning (FL) is a distributed and privacy-preserving machine learning paradigm that coordinates multiple clients to train a model while kee** the raw data localized. However, this traditional FL poses some challenges, including privacy risks, data heterogeneity, communication bottlenecks, and system heterogeneity issues. To tackle these challenges, knowledge distillation (KD) has been…
▽ More
Federated Learning (FL) is a distributed and privacy-preserving machine learning paradigm that coordinates multiple clients to train a model while kee** the raw data localized. However, this traditional FL poses some challenges, including privacy risks, data heterogeneity, communication bottlenecks, and system heterogeneity issues. To tackle these challenges, knowledge distillation (KD) has been widely applied in FL since 2020. KD is a validated and efficacious model compression and enhancement algorithm. The core concept of KD involves facilitating knowledge transfer between models by exchanging logits at intermediate or output layers. These properties make KD an excellent solution for the long-lasting challenges in FL. Up to now, there have been few reviews that summarize and analyze the current trend and methods for how KD can be applied in FL efficiently. This article aims to provide a comprehensive survey of KD-based FL, focusing on addressing the above challenges. First, we provide an overview of KD-based FL, including its motivation, basics, taxonomy, and a comparison with traditional FL and where KD should execute. We also analyze the critical factors in KD-based FL in the appendix, including teachers, knowledge, data, and methods. We discuss how KD can address the challenges in FL, including privacy protection, data heterogeneity, communication efficiency, and personalization. Finally, we discuss the challenges facing KD-based FL algorithms and future research directions. We hope this survey can provide insights and guidance for researchers and practitioners in the FL area.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
CroPrompt: Cross-task Interactive Prompting for Zero-shot Spoken Language Understanding
Authors:
Libo Qin,
Fuxuan Wei,
Qiguang Chen,
**gxuan Zhou,
Shijue Huang,
Jiasheng Si,
Wenpeng Lu,
Wanxiang Che
Abstract:
Slot filling and intent detection are two highly correlated tasks in spoken language understanding (SLU). Recent SLU research attempts to explore zero-shot prompting techniques in large language models to alleviate the data scarcity problem. Nevertheless, the existing prompting work ignores the cross-task interaction information for SLU, which leads to sub-optimal performance. To solve this proble…
▽ More
Slot filling and intent detection are two highly correlated tasks in spoken language understanding (SLU). Recent SLU research attempts to explore zero-shot prompting techniques in large language models to alleviate the data scarcity problem. Nevertheless, the existing prompting work ignores the cross-task interaction information for SLU, which leads to sub-optimal performance. To solve this problem, we present the pioneering work of Cross-task Interactive Prompting (CroPrompt) for SLU, which enables the model to interactively leverage the information exchange across the correlated tasks in SLU. Additionally, we further introduce a multi-task self-consistency mechanism to mitigate the error propagation caused by the intent information injection. We conduct extensive experiments on the standard SLU benchmark and the results reveal that CroPrompt consistently outperforms the existing prompting approaches. In addition, the multi-task self-consistency mechanism can effectively ease the error propagation issue, thereby enhancing the performance. We hope this work can inspire more research on cross-task prompting for SLU.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
Search for $X(1870)$ via the decay $J/ψ\to ωK^+ K^-η$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (644 additional authors not shown)
Abstract:
Using a sample of $(10087\pm 44)\times10^{6}$ $J/ψ$ events collected by the BESIII detector at the BEPCII collider, we search for the decay $X(1870)\to K^+ K^-η$ via the $J/ψ\to ωK^+ K^- η$ process for the first time. No significant $X(1870)$ signal is observed. The upper limit on the branching fraction of the decay $ J/ψ\to ωX(1870) \toωK^+ K^- η$ is determined to be $9.55\times 10^{-7}$ at the…
▽ More
Using a sample of $(10087\pm 44)\times10^{6}$ $J/ψ$ events collected by the BESIII detector at the BEPCII collider, we search for the decay $X(1870)\to K^+ K^-η$ via the $J/ψ\to ωK^+ K^- η$ process for the first time. No significant $X(1870)$ signal is observed. The upper limit on the branching fraction of the decay $ J/ψ\to ωX(1870) \toωK^+ K^- η$ is determined to be $9.55\times 10^{-7}$ at the $90\%$ confidence level. In addition, the branching faction $B(J/ψ\toωK^+ K^- η)$ is measured to be $(3.33\pm0.02(\rm{stat.})\pm 0.12(\rm{syst.}))\times 10^{-4}$.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Observation of $η_{c}$(1S, 2S) and $χ_{cJ}$ decays to 2$(π^{+}π^{-})η$ via $ψ$(3686) radiative transitions
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (636 additional authors not shown)
Abstract:
Based on $2.7 \times 10^9~ψ(3686)$ decays collected with the BESIII detector, the radiative decay $ψ(3686)\to\gamma2(π^{+}π^{-})η$ is investigated to measure properties of S- and P-wave charmonium states. The branching fraction of the decay $η_{c}(1S) \to 2(π^{+}π^{-})η$, which is found to have a strong dependence on the interference pattern between $η_c(1S)$ and non-$η_c(1S)$ processes, is measur…
▽ More
Based on $2.7 \times 10^9~ψ(3686)$ decays collected with the BESIII detector, the radiative decay $ψ(3686)\to\gamma2(π^{+}π^{-})η$ is investigated to measure properties of S- and P-wave charmonium states. The branching fraction of the decay $η_{c}(1S) \to 2(π^{+}π^{-})η$, which is found to have a strong dependence on the interference pattern between $η_c(1S)$ and non-$η_c(1S)$ processes, is measured in both destructive and constructive interference scenarios for the first time. The mass and width of the $η_{c}(1S)$ are measured to be $M=(2984.14 \pm 0.13 \pm 0.38)$ MeV/$c^{2}$ and $Γ=(28.82 \pm 0.11 \pm 0.82)$ MeV, respectively. Clear signals for the decays of the $χ_{cJ}(J=0,1,2)$ and the $η_{c}(2S)$ to $2(π^{+}π^{-})η$ are also observed for the first time, and the corresponding branching fractions are measured. The ratio of the branching fractions between the $η_{c}(2S)$ and $η_{c}(1S)$ decays is significantly lower than the theoretical prediction, which might suggest different dynamics in their decays.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Strong and weak $CP$ tests in sequential decays of polarized $Σ^0$ hyperons
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (644 additional authors not shown)
Abstract:
The $J/ψ, ψ(3686) \to Σ^0 \barΣ^{0}$ processes and subsequent decays are studied using the world's largest $J/ψ$ and $ψ(3686)$ data samples collected with the BESIII detector. The strong-$CP$ symmetry is tested in the decays of the $Σ^0$ hyperons for the first time by measuring the decay parameters, $α_{Σ^0} = -0.0017 \pm 0.0021 \pm 0.0018$ and $\barα_{Σ^0} = 0.0021 \pm 0.0020 \pm 0.0022$. The wea…
▽ More
The $J/ψ, ψ(3686) \to Σ^0 \barΣ^{0}$ processes and subsequent decays are studied using the world's largest $J/ψ$ and $ψ(3686)$ data samples collected with the BESIII detector. The strong-$CP$ symmetry is tested in the decays of the $Σ^0$ hyperons for the first time by measuring the decay parameters, $α_{Σ^0} = -0.0017 \pm 0.0021 \pm 0.0018$ and $\barα_{Σ^0} = 0.0021 \pm 0.0020 \pm 0.0022$. The weak-$CP$ test is performed in the subsequent decays of their daughter particles $Λ$ and $\barΛ$. Also for the first time, the transverse polarizations of the $Σ^0$ hyperons in $J/ψ$ and $ψ(3686)$ decays are observed with opposite directions, and the ratios between the S-wave and D-wave contributions of the $J/ψ, ψ(3686) \to Σ^0 \barΣ^{0}$ decays are obtained. These results are crucial to understand the decay dynamics of the charmonium states and the production mechanism of the $Σ^0-\barΣ^0$ pairs.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Measurement of the integrated luminosity of the data collected at 3.773 GeV by BESIII from 2021 to 2024
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (634 additional authors not shown)
Abstract:
We present a measurement of the integrated luminosity of $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at a center-of-mass energy of $E_{\rm cm} = 3.773$~GeV. The integrated luminosities of the data sets taken from December 2021 to June 2022, from November 2022 to June 2023, and from October 2023 to February 2024 are determined to be $4.995 \pm 0.019$~fb$^{-1}$,…
▽ More
We present a measurement of the integrated luminosity of $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at a center-of-mass energy of $E_{\rm cm} = 3.773$~GeV. The integrated luminosities of the data sets taken from December 2021 to June 2022, from November 2022 to June 2023, and from October 2023 to February 2024 are determined to be $4.995 \pm 0.019$~fb$^{-1}$, $8.157 \pm 0.031$~fb$^{-1}$, and $4.191 \pm 0.016$~fb$^{-1}$, respectively, by analyzing large angle Bhabha scattering events. The uncertainties are dominated by systematic effects and the statistical uncertainties are negligible. Our results provide essential input for future analyses and precision measurements.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
Flow of Reasoning: Efficient Training of LLM Policy with Divergent Thinking
Authors:
Fangxu Yu,
Lai Jiang,
Haoqiang Kang,
Shibo Hao,
Lianhui Qin
Abstract:
Divergent thinking, the cognitive process of generating diverse solutions, is a hallmark of human creativity and problem-solving. For machines, sampling diverse solution trajectories in complex reasoning problems is crucial for robust outcomes, data augmentation, and enhanced model generalization. Large language models (LLMs) often struggle with generating high-quality, diverse reasoning. While su…
▽ More
Divergent thinking, the cognitive process of generating diverse solutions, is a hallmark of human creativity and problem-solving. For machines, sampling diverse solution trajectories in complex reasoning problems is crucial for robust outcomes, data augmentation, and enhanced model generalization. Large language models (LLMs) often struggle with generating high-quality, diverse reasoning. While supervised fine-tuning helps with quality, it requires extensive supervision data to capture the full diversity of solutions. Alternatively, reinforcement learning methods like PPO aim to find limited highest-reward solutions while neglecting the solution diversity, akin to convergent thinking. To address these limitations, we propose Flow of Reasoning (FoR) -- an efficient LLM training approach enabling diverse reasoning with minimal data. FoR formulates multi-step LLM reasoning as a Markovian flow from an initial state to terminal states. The formulation allows to adapt principled GFlowNet approaches to train the LLM as a policy, which is able to sample multiple reasoning paths with probabilities proportional to the unnormalized reward. Empirical results show that, with limited training data (e.g., 15 examples), FoR can discover diverse high-quality solutions that excel greatly beyond current state-of-the-art methods across three tasks, including embodied reasoning (BlocksWorld), math puzzle solving (Game24), and logical reasoning (PrOntoQA). Code is available at https://github.com/Yu-Fangxu/FoR.
△ Less
Submitted 24 June, 2024; v1 submitted 9 June, 2024;
originally announced June 2024.
-
Measurements of the branching fractions of the $P$-wave charmonium spin-singlet state $h_c(^1P_1) \to h^+ h^-π^0/η$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (643 additional authors not shown)
Abstract:
Based on $(2712.4\pm 14.3)\times10^{6}$ $ψ(3686)$ events, we investigate four hadronic decay modes of the $P$-wave charmonium spin-singlet state $h_c(^1P_1) \to h^+ h^- π^0/η$ ($h=π$ or $K$) via the process $ψ(3686) \to π^{0}h_c$ at BESIII. The $h_c \to π^+ π^- π^0$ decay is observed with a significance of 9.6$σ$ after taking into account systematic uncertainties. Evidences for…
▽ More
Based on $(2712.4\pm 14.3)\times10^{6}$ $ψ(3686)$ events, we investigate four hadronic decay modes of the $P$-wave charmonium spin-singlet state $h_c(^1P_1) \to h^+ h^- π^0/η$ ($h=π$ or $K$) via the process $ψ(3686) \to π^{0}h_c$ at BESIII. The $h_c \to π^+ π^- π^0$ decay is observed with a significance of 9.6$σ$ after taking into account systematic uncertainties. Evidences for $h_c \to K^+ K^- π^0$ and $h_c \to K^+ K^- η$ are found with significances of $3.5σ$ and $3.3σ$, respectively, after considering the systematic uncertainties. The branching fractions of these decays are measured to be $\mathcal{B}(h_c \to π^+ π^- π^0)=(1.36\pm0.16\pm0.14)\times10^{-3}$, $\mathcal{B}(h_c \to K^+ K^- π^0)=(3.26\pm0.84\pm0.36)\times10^{-4}$, and $\mathcal{B}(h_c \to K^+ K^- η)=(3.13\pm1.08\pm0.38)\times10^{-4}$, where the first uncertainties are statistical and the second are systematic. No significant signal of $h_c\toπ^+π^-η$ is found, and the upper limit of its decay branching fraction is determined to be $\mathcal{B}(h_c\toπ^+π^-η) < 4.0 \times 10^{-4}$ at 90% confidence level.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Measurements of the branching fractions of semileptonic $D^{+}_s$ decays via $e^+e^-\to D_s^{*+}D_s^{*-}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (638 additional authors not shown)
Abstract:
We measure the absolute branching fractions of semileptonic $D^+_s$ decays via the $e^+e^-\to D_s^{*+}D_s^{*-}$ process using $e^+e^-$ collision data corresponding to an integrated luminosity of $10.64~\mathrm{fb}^{-1}$ collected by the BESIII detector at center-of-mass energies between 4.237 and 4.699 GeV. The branching fractions are…
▽ More
We measure the absolute branching fractions of semileptonic $D^+_s$ decays via the $e^+e^-\to D_s^{*+}D_s^{*-}$ process using $e^+e^-$ collision data corresponding to an integrated luminosity of $10.64~\mathrm{fb}^{-1}$ collected by the BESIII detector at center-of-mass energies between 4.237 and 4.699 GeV. The branching fractions are ${\mathcal B}(D_s^+\to ηe^+ν_e)=(2.35\pm0.11_{\rm stat}\pm 0.10_{\rm syst})\%,$ ${\mathcal
B}(D_s^+\to η^\prime e^+ν_e)=(0.82\pm0.09_{\rm stat}\pm 0.04_{\rm syst})\%,$ ${\mathcal B}(D_s^+\to φe^+ν_e)=(2.21\pm0.16_{\rm stat}\pm 0.11_{\rm syst})\%,$ ${\mathcal B}(D_s^+\to f_0(980) e^+ν_e,f_0(980)\toπ^+π^-)=(0.15\pm0.02_{\rm stat}\pm 0.01_{\rm syst})\%,$ ${\mathcal
B}(D_s^+\to K^0 e^+ν_e)=(0.24\pm0.04_{\rm stat}\pm 0.01_{\rm syst})\%,$ and ${\mathcal B}(D_s^+\to K^{*0} e^+ν_e)=(0.19\pm0.03_{\rm stat}\pm 0.01_{\rm syst})\%.$ These results are consistent with those measured via the $e^+e^-\to D_s^{*\pm}D_s^{\mp}$ process by BESIII and CLEO. The hadronic transition form factors $D^+_s\to ηe^+ν_e$, $D^+_s\to η^\prime e^+ν_e$, and $D^+_s\to K^0 e^+ν_e$ at four-momentum transfer squared $q^2$ = 0 are determined to be $f^η_+(0) = 0.482 \pm 0.011_{\rm stat} \pm 0.009_{\rm syst}\pm0.004_{\rm input},$ $f^{η^{\prime}}_+(0) = 0.562 \pm 0.031_{\rm stat} \pm 0.014_{\rm
syst}\pm0.003_{\rm input},$ and $f^{K^0}_+(0) = 0.624 \pm 0.052_{\rm
stat} \pm 0.013_{\rm syst}\pm0.002_{\rm input}.$
△ Less
Submitted 4 June, 2024; v1 submitted 3 June, 2024;
originally announced June 2024.
-
EduNLP: Towards a Unified and Modularized Library for Educational Resources
Authors:
Zhenya Huang,
Yuting Ning,
Longhu Qin,
Shiwei Tong,
Shangzi Xue,
Tong Xiao,
Xin Lin,
Jiayu Liu,
Qi Liu,
Enhong Chen,
Shi**g Wang
Abstract:
Educational resource understanding is vital to online learning platforms, which have demonstrated growing applications recently. However, researchers and developers always struggle with using existing general natural language toolkits or domain-specific models. The issue raises a need to develop an effective and easy-to-use one that benefits AI education-related research and applications. To bridg…
▽ More
Educational resource understanding is vital to online learning platforms, which have demonstrated growing applications recently. However, researchers and developers always struggle with using existing general natural language toolkits or domain-specific models. The issue raises a need to develop an effective and easy-to-use one that benefits AI education-related research and applications. To bridge this gap, we present a unified, modularized, and extensive library, EduNLP, focusing on educational resource understanding. In the library, we decouple the whole workflow to four key modules with consistent interfaces including data configuration, processing, model implementation, and model evaluation. We also provide a configurable pipeline to unify the data usage and model usage in standard ways, where users can customize their own needs. For the current version, we primarily provide 10 typical models from four categories, and 5 common downstream-evaluation tasks in the education domain on 8 subjects for users' usage. The project is released at: https://github.com/bigdata-ustc/EduNLP.
△ Less
Submitted 4 June, 2024; v1 submitted 3 June, 2024;
originally announced June 2024.
-
Guiding ChatGPT to Generate Salient Domain Summaries
Authors:
Jun Gao,
Ziqiang Cao,
Shaoyao Huang,
Luozheng Qin,
Chunhui Ai
Abstract:
ChatGPT is instruct-tuned to generate general and human-expected content to align with human preference through Reinforcement Learning from Human Feedback (RLHF), meanwhile resulting in generated responses not salient enough. Therefore, in this case, ChatGPT may fail to satisfy domain requirements in zero-shot settings, leading to poor ROUGE scores. Inspired by the In-Context Learning (ICL) and re…
▽ More
ChatGPT is instruct-tuned to generate general and human-expected content to align with human preference through Reinforcement Learning from Human Feedback (RLHF), meanwhile resulting in generated responses not salient enough. Therefore, in this case, ChatGPT may fail to satisfy domain requirements in zero-shot settings, leading to poor ROUGE scores. Inspired by the In-Context Learning (ICL) and retelling ability of ChatGPT, this paper proposes PADS, a \textbf{P}ipeline for \textbf{A}ssisting ChatGPT in \textbf{D}omain \textbf{S}ummarization. PADS consists of a retriever to retrieve similar examples from corpora and a rank model to rerank the multiple candidate summaries generated by ChatGPT. Specifically, given an inference document, we first retrieve an in-context demonstration via the retriever. Then, we require ChatGPT to generate $k$ candidate summaries for the inference document at a time under the guidance of the retrieved demonstration. Finally, the rank model independently scores the $k$ candidate summaries according to their quality and selects the optimal one. We extensively explore dense and sparse retrieval methods to select effective demonstrations for reference and efficiently train the rank model to reflect the quality of candidate summaries for each given summarized document. Additionally, PADS contains merely 400M trainable parameters originating from the rank model and we merely collect 2.5k data to train it. We evaluate PADS on five datasets from different domains, and the result indicates that each module in PADS is committed to effectively guiding ChatGPT to generate salient summaries fitting different domain requirements. Specifically, in the popular summarization dataset Gigaword, PADS achieves over +8 gain on ROUGE-L, compared with the naive ChatGPT in the zero-shot setting. \footnote{Our code are available at \url{https://github.com/jungao1106/PADS}}
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Search for $e^{+}e^{-}\toη'ψ(2S)$ at center-of-mass energies from 4.66 to 4.95 GeV
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (638 additional authors not shown)
Abstract:
Using data samples with an integrated luminosity of $4.67~\mathrm{fb}^{-1}$ collected by the BESIII detector operating at the BEPCII collider, we search for the process $e^+e^- \rightarrow η' ψ(2S)$ at center-of-mass energies from $4.66$ to $4.95~\mathrm{GeV}$. No significant signal is observed, and upper limits for the Born cross sections $σ^B(e^+e^-\rightarrowη'ψ(2S))$ at the 90\% confidence lev…
▽ More
Using data samples with an integrated luminosity of $4.67~\mathrm{fb}^{-1}$ collected by the BESIII detector operating at the BEPCII collider, we search for the process $e^+e^- \rightarrow η' ψ(2S)$ at center-of-mass energies from $4.66$ to $4.95~\mathrm{GeV}$. No significant signal is observed, and upper limits for the Born cross sections $σ^B(e^+e^-\rightarrowη'ψ(2S))$ at the 90\% confidence level are determined.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
Realization of a cold atom gyroscope in space
Authors:
**ting Li,
Xi Chen,
Danfang Zhang,
Wenzhang Wang,
Yang Zhou,
Meng He,
Jie Fang,
Lin Zhou,
Chuan He,
Junjie Jiang,
Huanyao Sun,
Qunfeng Chen,
Lei Qin,
Xiao Li,
Yibo Wang,
Xiaowei Zhang,
Jiaqi Zhong,
Runbing Li,
Meizhen An,
Long Zhang,
Shuquan Wang,
Zongfeng Li,
** Wang,
Mingsheng Zhan
Abstract:
High precision gyroscopes in space are important for sophisticated scientific experiments and deep space navigation. Microgravity in the space provides an ideal condition for operation of a cold atom gyroscope. To demonstrate this advantage, an atom interferometer (AI) was launched and installed in the China Space Station in 2022. Here reported is a realization of the cold atom gyroscope with this…
▽ More
High precision gyroscopes in space are important for sophisticated scientific experiments and deep space navigation. Microgravity in the space provides an ideal condition for operation of a cold atom gyroscope. To demonstrate this advantage, an atom interferometer (AI) was launched and installed in the China Space Station in 2022. Here reported is a realization of the cold atom gyroscope with this AI. By applying point source interferometry, spatial fringes are obtained and acceleration and rotation are extracted. The angles of the Raman lasers are precisely calibrated to avoid measurement error, and other systematic errors are also considered for the rotation measurement. The evaluated rotation measurement is (-115.64+/-1.71)*10^-5 rad/s in space, and an acceleration measurement resolution of 1.03*10^-6 m/s^2 is also obtained for a single image. This study conducts the first AI-based gyroscope in space and paves a way for future space-based AI experiments.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
Study of the decays $χ_{cJ} \rightarrow Λ\barΛφ$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (637 additional authors not shown)
Abstract:
Based on $(2712.4 \pm 14.3) \times 10^{6}$ $ e^{+}e^{-}\toψ(3686)$ events collected with the BESIII detector operating at the BEPCII collider, we report the first evidence of $χ_{c0}\to Λ\bar Λφ$ decays and the first observation of $χ_{c1,2}\to Λ\bar Λφ$ decays, with significances of $4.5σ$, $11.3σ$ and $13.0σ$, respectively. The decay branching fractions of $χ_{c0,1,2}\to Λ\bar Λφ$ are measured t…
▽ More
Based on $(2712.4 \pm 14.3) \times 10^{6}$ $ e^{+}e^{-}\toψ(3686)$ events collected with the BESIII detector operating at the BEPCII collider, we report the first evidence of $χ_{c0}\to Λ\bar Λφ$ decays and the first observation of $χ_{c1,2}\to Λ\bar Λφ$ decays, with significances of $4.5σ$, $11.3σ$ and $13.0σ$, respectively. The decay branching fractions of $χ_{c0,1,2}\to Λ\bar Λφ$ are measured to be $( 2.99\pm1.24\pm0.19) \times 10^{-5}$, $(6.01\pm0.90\pm0.40 )\times 10^{-5}$, and $(7.13\pm0.81\pm0.36) \times 10^{-5}$, where the first uncertainties are statistical and the second systematic. No obvious enhancement near the $Λ\barΛ$ production threshold or excited $Λ$ state is found in the $Λφ$ (or $\barΛφ$) system.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
M$^3$CoT: A Novel Benchmark for Multi-Domain Multi-step Multi-modal Chain-of-Thought
Authors:
Qiguang Chen,
Libo Qin,
** Zhang,
Zhi Chen,
Xiao Xu,
Wanxiang Che
Abstract:
Multi-modal Chain-of-Thought (MCoT) requires models to leverage knowledge from both textual and visual modalities for step-by-step reasoning, which gains increasing attention. Nevertheless, the current MCoT benchmark still faces some challenges: (1) absence of visual modal reasoning, (2) single-step visual modal reasoning, and (3) Domain missing, thereby hindering the development of MCoT. Motivate…
▽ More
Multi-modal Chain-of-Thought (MCoT) requires models to leverage knowledge from both textual and visual modalities for step-by-step reasoning, which gains increasing attention. Nevertheless, the current MCoT benchmark still faces some challenges: (1) absence of visual modal reasoning, (2) single-step visual modal reasoning, and (3) Domain missing, thereby hindering the development of MCoT. Motivated by this, we introduce a novel benchmark (M$^3$CoT) to address the above challenges, advancing the multi-domain, multi-step, and multi-modal CoT. Additionally, we conduct a thorough evaluation involving abundant MCoT approaches on Vision Large Language Models (VLLMs). In addition, we highlight that the current VLLMs still struggle to correctly reason in M$^3$CoT and there remains a large gap between existing VLLMs and human performance in M$^3$CoT, despite their superior results on previous MCoT benchmarks. To our knowledge, we take the first meaningful step toward the multi-domain, multi-step, and multi-modal scenario in MCoT. We hope that M$^3$CoT can serve as a valuable resource, providing a pioneering foundation in multi-domain, multi-step, multi-modal chain-of-thought research.
△ Less
Submitted 26 May, 2024;
originally announced May 2024.
-
Performance Optimization in RSMA-assisted Uplink xURLLC IIoT Networks with Statistical QoS Provisioning
Authors:
Yuang Chen,
Hancheng Lu,
Chang Wu,
Langtian Qin,
Xiaobo Guo
Abstract:
Industry 5.0 and beyond networks have driven the emergence of numerous mission-critical applications, prompting contemplation of the neXt-generation ultra-reliable low-latency communication (xURLLC). To guarantee low-latency requirements, xURLLC heavily relies on short-blocklength packets with sporadic arrival traffic. As a disruptive multi-access technique, rate-splitting multiple access (RSMA) h…
▽ More
Industry 5.0 and beyond networks have driven the emergence of numerous mission-critical applications, prompting contemplation of the neXt-generation ultra-reliable low-latency communication (xURLLC). To guarantee low-latency requirements, xURLLC heavily relies on short-blocklength packets with sporadic arrival traffic. As a disruptive multi-access technique, rate-splitting multiple access (RSMA) has emerged as a promising avenue to enhance quality of service (QoS) and flexibly manage interference for next-generation communication networks. In this paper, we investigate an innovative RSMA-assisted uplink xURLLC industrial internet-of-things (IIoT) (RSMA-xURLLC-IIoT) network. To unveil reliable insights into the statistical QoS provisioning (SQP) for our proposed network with sporadic arrival traffic, we leverage stochastic network calculus (SNC) to develop a dependable theoretical framework. Building upon this theoretical framework, we formulate the SQP-driven short-packet size maximization problem and the SQP-driven transmit power minimization problem, aiming to guarantee the SQP performance to latency, decoding, and reliability while maximizing the short-packet size and minimizing the transmit power, respectively. By exploiting Monte-Carlo methods, we have thoroughly validated the dependability of the developed theoretical framework. Moreover, through extensive comparison analysis with state-of-the-art multi-access techniques, including non-orthogonal multiple access (NOMA) and orthogonal multiple access (OMA), we have demonstrated the superior performance gains achieved by the proposed RSMA-xURLLC-IIoT networks.
△ Less
Submitted 26 May, 2024;
originally announced May 2024.
-
Endowing Interpretability for Neural Cognitive Diagnosis by Efficient Kolmogorov-Arnold Networks
Authors:
Shangshang Yang,
Linrui Qin,
Xiaoshan Yu
Abstract:
In the realm of intelligent education, cognitive diagnosis plays a crucial role in subsequent recommendation tasks attributed to the revealed students' proficiency in knowledge concepts. Although neural network-based neural cognitive diagnosis models (CDMs) have exhibited significantly better performance than traditional models, neural cognitive diagnosis is criticized for the poor model interpret…
▽ More
In the realm of intelligent education, cognitive diagnosis plays a crucial role in subsequent recommendation tasks attributed to the revealed students' proficiency in knowledge concepts. Although neural network-based neural cognitive diagnosis models (CDMs) have exhibited significantly better performance than traditional models, neural cognitive diagnosis is criticized for the poor model interpretability due to the multi-layer perception (MLP) employed, even with the monotonicity assumption. Therefore, this paper proposes to empower the interpretability of neural cognitive diagnosis models through efficient kolmogorov-arnold networks (KANs), named KAN2CD, where KANs are designed to enhance interpretability in two manners. Specifically, in the first manner, KANs are directly used to replace the used MLPs in existing neural CDMs; while in the second manner, the student embedding, exercise embedding, and concept embedding are directly processed by several KANs, and then their outputs are further combined and learned in a unified KAN to get final predictions. To overcome the problem of training KANs slowly, we modify the implementation of original KANs to accelerate the training. Experiments on four real-world datasets show that the proposed KA2NCD exhibits better performance than traditional CDMs, and the proposed KA2NCD still has a bit of performance leading even over the existing neural CDMs. More importantly, the learned structures of KANs enable the proposed KA2NCD to hold as good interpretability as traditional CDMs, which is superior to existing neural CDMs. Besides, the training cost of the proposed KA2NCD is competitive to existing models.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Study of the decays $χ_{cJ}\toΛ\barΛω$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (638 additional authors not shown)
Abstract:
Using $(27.12\pm 0.14)\times10^{8}$ $ψ(3686)$ events collected with the BESIII detector, we present the first observation of the decays $χ_{cJ}\toΛ\barΛω$, where $J=0, 1, 2$, with statistical significances of $11.7 σ, 11.2 σ$, and $11.8 σ$. The branching fractions of these decays are determined to be $\mathcal{B}(χ_{c0}\toΛ\barΛω)=({2.37 \pm 0.22 \pm 0.23}) \times 10^{-4}$,…
▽ More
Using $(27.12\pm 0.14)\times10^{8}$ $ψ(3686)$ events collected with the BESIII detector, we present the first observation of the decays $χ_{cJ}\toΛ\barΛω$, where $J=0, 1, 2$, with statistical significances of $11.7 σ, 11.2 σ$, and $11.8 σ$. The branching fractions of these decays are determined to be $\mathcal{B}(χ_{c0}\toΛ\barΛω)=({2.37 \pm 0.22 \pm 0.23}) \times 10^{-4}$, $\mathcal{B}(χ_{c1}\toΛ\barΛω)=({1.01 \pm 0.10 \pm 0.11}) \times 10^{-4}$, and $\mathcal{B}(χ_{c2}\toΛ\barΛω)=({1.40 \pm 0.13 \pm 0.17}) \times 10^{-4}$, where the first uncertainties are statistical and the second are systematic. We observe no clear intermediate structures.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
An Empirical Study and Analysis of Text-to-Image Generation Using Large Language Model-Powered Textual Representation
Authors:
Zhiyu Tan,
Meng** Yang,
Luozheng Qin,
Hao Yang,
Ye Qian,
Qiang Zhou,
Cheng Zhang,
Hao Li
Abstract:
One critical prerequisite for faithful text-to-image generation is the accurate understanding of text inputs. Existing methods leverage the text encoder of the CLIP model to represent input prompts. However, the pre-trained CLIP model can merely encode English with a maximum token length of 77. Moreover, the model capacity of the text encoder from CLIP is relatively limited compared to Large Langu…
▽ More
One critical prerequisite for faithful text-to-image generation is the accurate understanding of text inputs. Existing methods leverage the text encoder of the CLIP model to represent input prompts. However, the pre-trained CLIP model can merely encode English with a maximum token length of 77. Moreover, the model capacity of the text encoder from CLIP is relatively limited compared to Large Language Models (LLMs), which offer multilingual input, accommodate longer context, and achieve superior text representation. In this paper, we investigate LLMs as the text encoder to improve the language understanding in text-to-image generation. Unfortunately, training text-to-image generative model with LLMs from scratch demands significant computational resources and data. To this end, we introduce a three-stage training pipeline that effectively and efficiently integrates the existing text-to-image model with LLMs. Specifically, we propose a lightweight adapter that enables fast training of the text-to-image model using the textual representations from LLMs. Extensive experiments demonstrate that our model supports not only multilingual but also longer input context with superior image generation quality.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Efficient Influence Minimization via Node Blocking
Authors:
**ghao Wang,
Yan** Wu,
Xiaoyang Wang,
Ying Zhang,
Lu Qin,
Wenjie Zhang,
Xuemin Lin
Abstract:
Given a graph G, a budget k and a misinformation seed set S, Influence Minimization (IMIN) via node blocking aims to find a set of k nodes to be blocked such that the expected spread of S is minimized. This problem finds important applications in suppressing the spread of misinformation and has been extensively studied in the literature. However, existing solutions for IMIN still incur significant…
▽ More
Given a graph G, a budget k and a misinformation seed set S, Influence Minimization (IMIN) via node blocking aims to find a set of k nodes to be blocked such that the expected spread of S is minimized. This problem finds important applications in suppressing the spread of misinformation and has been extensively studied in the literature. However, existing solutions for IMIN still incur significant computation overhead, especially when k becomes large. In addition, there is still no approximation solution with non-trivial theoretical guarantee for IMIN via node blocking prior to our work. In this paper, we conduct the first attempt to propose algorithms that yield data-dependent approximation guarantees. Based on the Sandwich framework, we first develop submodular and monotonic lower and upper bounds for our non-submodular objective function and prove the computation of proposed bounds is \#P-hard. In addition, two advanced sampling methods are proposed to estimate the value of bounding functions. Moreover, we develop two novel martingale-based concentration bounds to reduce the sample complexity and design two non-trivial algorithms that provide (1-1/e-ε)-approximate solutions to our bounding functions. Comprehensive experiments on 9 real-world datasets are conducted to validate the efficiency and effectiveness of the proposed techniques. Compared with the state-of-the-art methods, our solutions can achieve up to two orders of magnitude speedup and provide theoretical guarantees for the quality of returned results.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Large Language Models Meet NLP: A Survey
Authors:
Libo Qin,
Qiguang Chen,
Xiachong Feng,
Yang Wu,
Yongheng Zhang,
Yinghui Li,
Min Li,
Wanxiang Che,
Philip S. Yu
Abstract:
While large language models (LLMs) like ChatGPT have shown impressive capabilities in Natural Language Processing (NLP) tasks, a systematic investigation of their potential in this field remains largely unexplored. This study aims to address this gap by exploring the following questions: (1) How are LLMs currently applied to NLP tasks in the literature? (2) Have traditional NLP tasks already been…
▽ More
While large language models (LLMs) like ChatGPT have shown impressive capabilities in Natural Language Processing (NLP) tasks, a systematic investigation of their potential in this field remains largely unexplored. This study aims to address this gap by exploring the following questions: (1) How are LLMs currently applied to NLP tasks in the literature? (2) Have traditional NLP tasks already been solved with LLMs? (3) What is the future of the LLMs for NLP? To answer these questions, we take the first step to provide a comprehensive overview of LLMs in NLP. Specifically, we first introduce a unified taxonomy including (1) parameter-frozen application and (2) parameter-tuning application to offer a unified perspective for understanding the current progress of LLMs in NLP. Furthermore, we summarize the new frontiers and the associated challenges, aiming to inspire further groundbreaking advancements. We hope this work offers valuable insights into the {potential and limitations} of LLMs in NLP, while also serving as a practical guide for building effective LLMs in NLP.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Precision measurement of the branching fraction of \boldmath $J/ψ\rightarrow K^+K^-$ via $ψ(2S)\rightarrow π^+π^-J/ψ$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (604 additional authors not shown)
Abstract:
Using a sample of $448.1 \times 10^6$ $ψ(2S)$ events collected with the BESIII detector, we perform a study of the decay $J/ψ\rightarrow K^+K^-$ via $ψ(2S)\rightarrow π^+π^-J/ψ$.
The branching fraction of $J/ψ\rightarrow K^+K^-$ is determined to be $\mathcal{B}_{K^+K^-}=(3.072\pm 0.023({\rm stat.})\pm 0.050({\rm syst.}))\times 10^{-4}$, which is consistent with previous measurements but with sig…
▽ More
Using a sample of $448.1 \times 10^6$ $ψ(2S)$ events collected with the BESIII detector, we perform a study of the decay $J/ψ\rightarrow K^+K^-$ via $ψ(2S)\rightarrow π^+π^-J/ψ$.
The branching fraction of $J/ψ\rightarrow K^+K^-$ is determined to be $\mathcal{B}_{K^+K^-}=(3.072\pm 0.023({\rm stat.})\pm 0.050({\rm syst.}))\times 10^{-4}$, which is consistent with previous measurements but with significantly improved precision.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Postselected amplification applied to atomic magnetometers
Authors:
Yazhi Niu,
Jialin Li,
Lupei Qin,
Xin-Qi Li
Abstract:
We propose to embed the atomic magnetometer (AM) into an optical Mach-Zehnder interferometer (MZI). We analyze the effect of amplification of the Faraday rotation angle of the probe laser light, by properly postselecting the path-information state of the laser photons when passing through the MZI. The proposed scheme provides a postselected metrological protocol of probing weak magnetic fields, ha…
▽ More
We propose to embed the atomic magnetometer (AM) into an optical Mach-Zehnder interferometer (MZI). We analyze the effect of amplification of the Faraday rotation angle of the probe laser light, by properly postselecting the path-information state of the laser photons when passing through the MZI. The proposed scheme provides a postselected metrological protocol of probing weak magnetic fields, having a potential to further enhance the sensitivity of the nowadays state-of-the-art optical AM.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Improved measurement of the branching fraction of $h_{c}\rightarrowγη^\prime/η$ and search for $h_{c}\rightarrowγπ^0$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (645 additional authors not shown)
Abstract:
The processes $h_c\rightarrowγP(P = η^\prime,~η,~π^{0}))$ are studied with a sample of $(27.12\pm0.14)\times10^{8}$ $ψ(3686)$ events collected by the BESIII detector at the BEPCII collider. The branching fractions of $h_c\rightarrowγη^\prime$ and $h_c\rightarrowγη$ are measured to be $(1.40\pm0.11\pm0.04\pm0.10)\times10^{-3}$ and $(3.77\pm0.55\pm0.13\pm0.26)\times10^{-4}$, respectively, where the…
▽ More
The processes $h_c\rightarrowγP(P = η^\prime,~η,~π^{0}))$ are studied with a sample of $(27.12\pm0.14)\times10^{8}$ $ψ(3686)$ events collected by the BESIII detector at the BEPCII collider. The branching fractions of $h_c\rightarrowγη^\prime$ and $h_c\rightarrowγη$ are measured to be $(1.40\pm0.11\pm0.04\pm0.10)\times10^{-3}$ and $(3.77\pm0.55\pm0.13\pm0.26)\times10^{-4}$, respectively, where the first uncertainties are statistical, the second systematic, and the third from the branching fraction of $ψ(3686)\rightarrowπ^{0}h_c$. The ratio $R_{h_c}=\frac{\mathscr{B}(h_c\rightarrowγη)}{\mathscr{B}(h_c\rightarrowγη^\prime)}$ is calculated to be $(27.0\pm4.4\pm1.0)\%$. The measurements are consistent with the previous results with improved precision by a factor of 2. The results are valuable for gaining a deeper understanding of $η-η^\prime$ mixing, and its manifestation within quantum chromodynamics. No significant signal is found for the decay $h_c\rightarrowγπ^{0}$, and an upper limit is placed on its branching fraction of $\mathscr{B}(h_c\rightarrowγπ^{0})<5.0\times10^{-5}$, at the 90\% confidence level.
△ Less
Submitted 19 May, 2024;
originally announced May 2024.
-
Search for the leptonic decays $D^{*+}\to e^+ν_e$ and $D^{*+}\to μ^+ν_μ$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
M. Albrecht,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
Y. Bai,
O. Bakina,
R. Baldini Ferroli,
I. Balossino,
Y. Ban,
V. Batozskaya,
D. Becker,
K. Begzsuren,
N. Berger,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
J. Bloms,
A. Bortone,
I. Boyko
, et al. (559 additional authors not shown)
Abstract:
We present the first search for the leptonic decays $D^{*+}\to e^+ν_e$ and $D^{*+}\to μ^+ν_μ$ by analyzing a data sample of electron-positron collisions recorded with the BESIII detector at center-of-mass energies between 4.178 and 4.226 GeV, corresponding to an integrated luminosity of 6.32~fb$^{-1}$. No significant signal is observed. The upper limits on the branching fractions for…
▽ More
We present the first search for the leptonic decays $D^{*+}\to e^+ν_e$ and $D^{*+}\to μ^+ν_μ$ by analyzing a data sample of electron-positron collisions recorded with the BESIII detector at center-of-mass energies between 4.178 and 4.226 GeV, corresponding to an integrated luminosity of 6.32~fb$^{-1}$. No significant signal is observed. The upper limits on the branching fractions for $D^{*+}\to e^+ν_e$ and $D^{*+}\to μ^+ν_μ$ are set to be $1.1 \times 10^{-5}$ and $4.3 \times 10^{-6}$ at 90\% confidence level, respectively.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
Search for the radiative transition $χ_{c1}(3872)\toγψ_2(3823)$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko
, et al. (635 additional authors not shown)
Abstract:
Using 9.0 $\rm fb^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies from 4.178 to 4.278 GeV with the BESIII detector at the BEPCII collider, we perform the first search for the radiative transition $χ_{c1}(3872)\toγψ_2(3823)$. No $χ_{c1}(3872)\toγψ_2(3823)$ signal is observed. The upper limit on the ratio of branching fractions…
▽ More
Using 9.0 $\rm fb^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies from 4.178 to 4.278 GeV with the BESIII detector at the BEPCII collider, we perform the first search for the radiative transition $χ_{c1}(3872)\toγψ_2(3823)$. No $χ_{c1}(3872)\toγψ_2(3823)$ signal is observed. The upper limit on the ratio of branching fractions $\mathcal{B}(χ_{c1}(3872)\toγψ_2(3823), ψ_2(3823)\toγχ_{c1})/\mathcal{B}(χ_{c1}(3872)\toπ^+π^- J/ψ)$ is set as 0.075 at the 90\% confidence level. Our result contradicts theoretical predictions under the assumption that the $χ_{c1}(3872)$ is the pure charmonium state $χ_{c1}(2P)$.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Measurement of the ${e}^{+}{e}^{-}\to p \bar{p}π^{0}$ cross section at $\sqrt{s}=2.1000-3.0800$ GeV
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (639 additional authors not shown)
Abstract:
The process $e^{+}e^{-}\to p\bar{p}π^{0}$ is studied at 20 center-of-mass energies ranging from 2.1000 to 3.0800 GeV using 636.8 pb$^{-1}$ of data collected with the BESIII detector operating at the BEPCII collider. The Born cross sections for $e^{+}e^{-}\to p\bar{p}π^{0}$ are measured with high precision. Since the lowest center-of-mass energy, 2.1000 GeV, is less than 90 MeV above the…
▽ More
The process $e^{+}e^{-}\to p\bar{p}π^{0}$ is studied at 20 center-of-mass energies ranging from 2.1000 to 3.0800 GeV using 636.8 pb$^{-1}$ of data collected with the BESIII detector operating at the BEPCII collider. The Born cross sections for $e^{+}e^{-}\to p\bar{p}π^{0}$ are measured with high precision. Since the lowest center-of-mass energy, 2.1000 GeV, is less than 90 MeV above the $p\bar{p}π^0$ energy threshold, we can probe the threshold behavior for this reaction. However, no anomalous threshold enhancement is found in the cross sections for $e^{+}e^{-}\to p\bar{p}π^{0}$.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
DARA: Domain- and Relation-aware Adapters Make Parameter-efficient Tuning for Visual Grounding
Authors:
Ting Liu,
Xuyang Liu,
Siteng Huang,
Honggang Chen,
Quanjun Yin,
Long Qin,
Donglin Wang,
Yue Hu
Abstract:
Visual grounding (VG) is a challenging task to localize an object in an image based on a textual description. Recent surge in the scale of VG models has substantially improved performance, but also introduced a significant burden on computational costs during fine-tuning. In this paper, we explore applying parameter-efficient transfer learning (PETL) to efficiently transfer the pre-trained vision-…
▽ More
Visual grounding (VG) is a challenging task to localize an object in an image based on a textual description. Recent surge in the scale of VG models has substantially improved performance, but also introduced a significant burden on computational costs during fine-tuning. In this paper, we explore applying parameter-efficient transfer learning (PETL) to efficiently transfer the pre-trained vision-language knowledge to VG. Specifically, we propose \textbf{DARA}, a novel PETL method comprising \underline{\textbf{D}}omain-aware \underline{\textbf{A}}dapters (DA Adapters) and \underline{\textbf{R}}elation-aware \underline{\textbf{A}}dapters (RA Adapters) for VG. DA Adapters first transfer intra-modality representations to be more fine-grained for the VG domain. Then RA Adapters share weights to bridge the relation between two modalities, improving spatial reasoning. Empirical results on widely-used benchmarks demonstrate that DARA achieves the best accuracy while saving numerous updated parameters compared to the full fine-tuning and other PETL methods. Notably, with only \textbf{2.13\%} tunable backbone parameters, DARA improves average accuracy by \textbf{0.81\%} across the three benchmarks compared to the baseline model. Our code is available at \url{https://github.com/liuting20/DARA}.
△ Less
Submitted 8 June, 2024; v1 submitted 9 May, 2024;
originally announced May 2024.
-
GRSN: Gated Recurrent Spiking Neurons for POMDPs and MARL
Authors:
Lang Qin,
Ziming Wang,
Runhao Jiang,
Rui Yan,
Hua** Tang
Abstract:
Spiking neural networks (SNNs) are widely applied in various fields due to their energy-efficient and fast-inference capabilities. Applying SNNs to reinforcement learning (RL) can significantly reduce the computational resource requirements for agents and improve the algorithm's performance under resource-constrained conditions. However, in current spiking reinforcement learning (SRL) algorithms,…
▽ More
Spiking neural networks (SNNs) are widely applied in various fields due to their energy-efficient and fast-inference capabilities. Applying SNNs to reinforcement learning (RL) can significantly reduce the computational resource requirements for agents and improve the algorithm's performance under resource-constrained conditions. However, in current spiking reinforcement learning (SRL) algorithms, the simulation results of multiple time steps can only correspond to a single-step decision in RL. This is quite different from the real temporal dynamics in the brain and also fails to fully exploit the capacity of SNNs to process temporal data. In order to address this temporal mismatch issue and further take advantage of the inherent temporal dynamics of spiking neurons, we propose a novel temporal alignment paradigm (TAP) that leverages the single-step update of spiking neurons to accumulate historical state information in RL and introduces gated units to enhance the memory capacity of spiking neurons. Experimental results show that our method can solve partially observable Markov decision processes (POMDPs) and multi-agent cooperation problems with similar performance as recurrent neural networks (RNNs) but with about 50% power consumption.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Study of $e^+e^-\toωX(3872)$ and $γX(3872)$ from 4.66 to 4.95 GeV
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (634 additional authors not shown)
Abstract:
Using data samples with an integrated luminosity of $4.5~\text{fb}^{-1}$ collected by the BESIII detector at center-of-mass energies ranging from 4.66 to 4.95 GeV, we study the processes of $e^+e^-\toωX(3872)$ and $e^+e^-\toγX(3872)$. With the $e^+e^-\toωX(3872)$ process, the branching fraction ratio $R\equiv\frac{\mathcal{B}(X(3872)\toγJ/ψ)}{\mathcal{B}(X(3872)\toπ^+π^- J/ψ)}$ is measured to be…
▽ More
Using data samples with an integrated luminosity of $4.5~\text{fb}^{-1}$ collected by the BESIII detector at center-of-mass energies ranging from 4.66 to 4.95 GeV, we study the processes of $e^+e^-\toωX(3872)$ and $e^+e^-\toγX(3872)$. With the $e^+e^-\toωX(3872)$ process, the branching fraction ratio $R\equiv\frac{\mathcal{B}(X(3872)\toγJ/ψ)}{\mathcal{B}(X(3872)\toπ^+π^- J/ψ)}$ is measured to be $0.38\pm0.20_\text{stat.}\pm0.01_\text{syst.}$ ($R< 0.83$ at 90\% confidence level). In addition, we measure the ratio of the average cross section of $e^+e^-\toωX(3872)$ to $e^+e^-\toωχ_{c1}(ωχ_{c2})$ to be $σ_{ωX(3872)}/σ_{ωχ_{c1}}~(σ_{ωX(3872)}/σ_{ωχ_{c2}})=5.2\pm1.0_\text{stat.}\pm1.9_\text{syst.}~ (5.5\pm1.1_\text{stat.}\pm2.4_\text{syst.})$. Finally, we search for the process of $e^+e^-\toγX(3872)$, and no obvious signal is observed. The upper limit on the ratio of the average cross section of $e^+e^-\toγX(3872)$ to $e^+e^-\toωX(3872)$ is set as $σ_{γX(3872)}/σ_{ωX(3872)}<0.23$ at 90\% confidence level.
△ Less
Submitted 21 April, 2024;
originally announced April 2024.
-
Observation of $D \to a_{0}(980)π$ in the decays $D^{0} \rightarrow π^{+}π^{-}η$ and $D^{+} \rightarrow π^{+}π^{0}η$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (634 additional authors not shown)
Abstract:
We report the first amplitude analysis of the decays $D^{0} \to π^{+} π^{-} η$ and $D^{+} \rightarrow π^{+}π^{0}η$ using a data sample taken with the BESIII detector at the center-of-mass energy of 3.773 GeV, corresponding to an integrated luminosity of 7.9 ${\rm fb}^{-1}$. The contribution from the process $D^{0(+)} \to a_{0}(980)^{+} π^{-(0)}$ is significantly larger than the…
▽ More
We report the first amplitude analysis of the decays $D^{0} \to π^{+} π^{-} η$ and $D^{+} \rightarrow π^{+}π^{0}η$ using a data sample taken with the BESIII detector at the center-of-mass energy of 3.773 GeV, corresponding to an integrated luminosity of 7.9 ${\rm fb}^{-1}$. The contribution from the process $D^{0(+)} \to a_{0}(980)^{+} π^{-(0)}$ is significantly larger than the $D^{0(+)} \to a_{0}(980)^{-(0)} π^{+}$ contribution. The ratios $\mathcal{B}(D^{0} \rightarrow a_{0}(980)^{+}π^{-})/\mathcal{B}(D^{0} \rightarrow a_{0}(980)^{-}π^{+})$ and $\mathcal{B}(D^{+} \rightarrow a_{0}(980)^{+}π^{0})/\mathcal{B}(D^{+} \rightarrow a_{0}(980)^{0}π^{+})$ are measured to be $7.5^{+2.5}_{-0.8\,\mathrm{stat.}}\pm1.7_{\mathrm{syst.}}$ and $2.6\pm0.6_{\mathrm{stat.}}\pm0.3_{\mathrm{syst.}}$, respectively. The measured $D^{0}$ ratio disagrees with the theoretical predictions by orders of magnitudes, thus implying a substantial contribution from final-state interactions.
△ Less
Submitted 14 April, 2024;
originally announced April 2024.
-
EduAgent: Generative Student Agents in Learning
Authors:
Songlin Xu,
Xinyu Zhang,
Lianhui Qin
Abstract:
Student simulation in online education is important to address dynamic learning behaviors of students with diverse backgrounds. Existing simulation models based on deep learning usually need massive training data, lacking prior knowledge in educational contexts. Large language models (LLMs) may contain such prior knowledge since they are pre-trained from a large corpus. However, because student be…
▽ More
Student simulation in online education is important to address dynamic learning behaviors of students with diverse backgrounds. Existing simulation models based on deep learning usually need massive training data, lacking prior knowledge in educational contexts. Large language models (LLMs) may contain such prior knowledge since they are pre-trained from a large corpus. However, because student behaviors are dynamic and multifaceted with individual differences, directly prompting LLMs is not robust nor accurate enough to capture fine-grained interactions among diverse student personas, learning behaviors, and learning outcomes. This work tackles this problem by presenting a newly annotated fine-grained large-scale dataset and proposing EduAgent, a novel generative agent framework incorporating cognitive prior knowledge (i.e., theoretical findings revealed in cognitive science) to guide LLMs to first reason correlations among various behaviors and then make simulations. Our two experiments show that EduAgent could not only mimic and predict learning behaviors of real students but also generate realistic learning behaviors of virtual students without real data.
△ Less
Submitted 23 March, 2024;
originally announced April 2024.
-
Measurement of $e^{+}e^{-}\to ωη^{\prime}$ cross sections at $\sqrt{s}=$ 2.000 to 3.080 GeV
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann
, et al. (599 additional authors not shown)
Abstract:
The Born cross sections for the process $e^{+}e^{-}\to ωη^{\prime}$ are measured at 22 center-of-mass energies from 2.000 to 3.080 GeV using data collected with the BESIII detector at the BEPCII collider. A resonant structure is observed with a statistical significance of 9.6$σ$. A Breit-Wigner fit determines its mass to be $M_R=(2153\pm30\pm31)~{\rm{MeV}}/c^{2}$ and its width to be…
▽ More
The Born cross sections for the process $e^{+}e^{-}\to ωη^{\prime}$ are measured at 22 center-of-mass energies from 2.000 to 3.080 GeV using data collected with the BESIII detector at the BEPCII collider. A resonant structure is observed with a statistical significance of 9.6$σ$. A Breit-Wigner fit determines its mass to be $M_R=(2153\pm30\pm31)~{\rm{MeV}}/c^{2}$ and its width to be $Γ_{R}=(167\pm77\pm7)~\rm{MeV}$, where the first uncertainties are statistical and the second are systematic.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
Improving Language Model Reasoning with Self-motivated Learning
Authors:
Yunlong Feng,
Yang Xu,
Libo Qin,
Yasheng Wang,
Wanxiang Che
Abstract:
Large-scale high-quality training data is important for improving the performance of models. After trained with data that has rationales (reasoning steps), models gain reasoning capability. However, the dataset with high-quality rationales is relatively scarce due to the high annotation cost. To address this issue, we propose \textit{Self-motivated Learning} framework. The framework motivates the…
▽ More
Large-scale high-quality training data is important for improving the performance of models. After trained with data that has rationales (reasoning steps), models gain reasoning capability. However, the dataset with high-quality rationales is relatively scarce due to the high annotation cost. To address this issue, we propose \textit{Self-motivated Learning} framework. The framework motivates the model itself to automatically generate rationales on existing datasets. Based on the inherent rank from correctness across multiple rationales, the model learns to generate better rationales, leading to higher reasoning capability. Specifically, we train a reward model with the rank to evaluate the quality of rationales, and improve the performance of reasoning through reinforcement learning. Experiment results of Llama2 7B on multiple reasoning datasets show that our method significantly improves the reasoning ability of models, even outperforming text-davinci-002 in some datasets.
△ Less
Submitted 30 April, 2024; v1 submitted 10 April, 2024;
originally announced April 2024.
-
Measurement of the Born cross section for $e^{+}e^{-}\to ηh_c $ at center-of-mass energies between 4.1 and 4.6\,GeV
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (634 additional authors not shown)
Abstract:
We measure the Born cross section for the reaction $e^{+}e^{-} \rightarrow ηh_c$ from $\sqrt{s} = 4.129$ to $4.600$~GeV using data sets collected by the BESIII detector running at the BEPCII collider. A resonant structure in the cross section line shape near 4.200~GeV is observed with a statistical significance of 7$σ$. The parameters of this resonance are measured to be \MeasMass\ and \MeasWidth,…
▽ More
We measure the Born cross section for the reaction $e^{+}e^{-} \rightarrow ηh_c$ from $\sqrt{s} = 4.129$ to $4.600$~GeV using data sets collected by the BESIII detector running at the BEPCII collider. A resonant structure in the cross section line shape near 4.200~GeV is observed with a statistical significance of 7$σ$. The parameters of this resonance are measured to be \MeasMass\ and \MeasWidth, where the first uncertainties are statistical and the second systematic.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.