Search | arXiv e-print repository

Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models

Authors: Wenhao Yu, Hongming Zhang, Xiaoman Pan, Kaixin Ma, Hongwei Wang, Dong Yu

Abstract: Retrieval-augmented language models (RALMs) represent a substantial advancement in the capabilities of large language models, notably in reducing factual hallucination by leveraging external knowledge sources. However, the reliability of the retrieved information is not always guaranteed. The retrieval of irrelevant data can lead to misguided responses, and potentially causing the model to overloo… ▽ More Retrieval-augmented language models (RALMs) represent a substantial advancement in the capabilities of large language models, notably in reducing factual hallucination by leveraging external knowledge sources. However, the reliability of the retrieved information is not always guaranteed. The retrieval of irrelevant data can lead to misguided responses, and potentially causing the model to overlook its inherent knowledge, even when it possesses adequate information to address the query. Moreover, standard RALMs often struggle to assess whether they possess adequate knowledge, both intrinsic and retrieved, to provide an accurate answer. In situations where knowledge is lacking, these systems should ideally respond with "unknown" when the answer is unattainable. In response to these challenges, we introduces Chain-of-Noting (CoN), a novel approach aimed at improving the robustness of RALMs in facing noisy, irrelevant documents and in handling unknown scenarios. The core idea of CoN is to generate sequential reading notes for retrieved documents, enabling a thorough evaluation of their relevance to the given question and integrating this information to formulate the final answer. We employed ChatGPT to create training data for CoN, which was subsequently trained on an LLaMa-2 7B model. Our experiments across four open-domain QA benchmarks show that RALMs equipped with CoN significantly outperform standard RALMs. Notably, CoN achieves an average improvement of +7.9 in EM score given entirely noisy retrieved documents and +10.5 in rejection rates for real-time questions that fall outside the pre-training knowledge scope. △ Less

Submitted 15 November, 2023; originally announced November 2023.

Comments: Preprint

arXiv:2311.07043 [pdf, ps, other]

Study of the decay $J/ψ\to φπ^{0}η$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (604 additional authors not shown)

Abstract: Based on $(10.09 \pm 0.04) \times 10^9$ $J/ψ$ events collected with the BESIII detector operating at the BEPCII collider, a partial wave analysis of the decay $J/ψ\to φπ^{0}η$ is performed. We observe for the first time two new structures on the $φη$ invariant mass distribution, with statistical significances of $24.0σ$ and $16.9σ$; the first with $J^{\rm PC}$ = $1^{+-}$, mass M = (1911 $\pm$ 6 (s… ▽ More Based on $(10.09 \pm 0.04) \times 10^9$ $J/ψ$ events collected with the BESIII detector operating at the BEPCII collider, a partial wave analysis of the decay $J/ψ\to φπ^{0}η$ is performed. We observe for the first time two new structures on the $φη$ invariant mass distribution, with statistical significances of $24.0σ$ and $16.9σ$; the first with $J^{\rm PC}$ = $1^{+-}$, mass M = (1911 $\pm$ 6 (stat.) $\pm$ 14 (sys.))~MeV/$c^{2}$, and width $Γ= $ (149 $\pm$ 12 (stat.) $\pm$ 23 (sys.))~MeV, the second with $J^{\rm PC}$ = $1^{--}$, mass M = (1996 $\pm$ 11 (stat.) $\pm$ 30 (sys.))~MeV/$c^{2}$, and width $Γ$ = (148 $\pm$ 16 (stat.) $\pm$ 66 (sys.))~MeV. These measurements provide important input for the strangeonium spectrum. In addition, the $f_0(980)-a_0(980)^0$ mixing signal in $J/ψ\to φf_0(980) \to φa_0(980)^0$ and the corresponding electromagnetic decay $J/ψ\to φa_0(980)^0$ are measured with improved precision, providing crucial information to understand the nature of $a_0(980)^0$ and $f_0(980)$. △ Less

Submitted 14 November, 2023; v1 submitted 12 November, 2023; originally announced November 2023.

arXiv:2311.06883 [pdf, other]

doi 10.1103/PhysRevD.109.L091101

Evidence of the Singly Cabibbo Suppressed decay $Λ_c^+\to pπ^0$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann , et al. (600 additional authors not shown)

Abstract: Evidence for the singly Cabibbo suppressed decay $Λ_c^+\to pπ^0$ is reported for the first time with a statistical significance of $3.7σ$ based on 6.0 $\mathrm{fb}^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies between 4.600 and 4.843 GeV with the BESIII detector at the BEPCII collider. The absolute branching fraction of $Λ_c^+\to pπ^0$ is measured to be… ▽ More Evidence for the singly Cabibbo suppressed decay $Λ_c^+\to pπ^0$ is reported for the first time with a statistical significance of $3.7σ$ based on 6.0 $\mathrm{fb}^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies between 4.600 and 4.843 GeV with the BESIII detector at the BEPCII collider. The absolute branching fraction of $Λ_c^+\to pπ^0$ is measured to be $(1.56^{+0.72}_{-0.58}\pm0.20)\times 10^{-4}$. Combining with the branching fraction of $Λ_c^+\to nπ^+$, $(6.6\pm1.3)\times10^{-4}$, the ratio of the branching fractions of $Λ_c^+\to nπ^+$ and $Λ_c^+\to pπ^0$ is calculated to be $3.2^{+2.2}_{-1.2}$. As an important input for the theoretical models describing the decay mechanisms of charmed baryons, our result indicates that the non-factorizable contributions play an essential role and their interference with the factorizable contributions should not be significant. In addition, the absolute branching fraction of $Λ_c^+\to pη$ is measured to be $(1.63\pm0.31_{\rm stat}\pm0.11_{\rm syst}) \times10^{-3}$. △ Less

Submitted 3 June, 2024; v1 submitted 12 November, 2023; originally announced November 2023.

Comments: 9 pages, 3 figures

Journal ref: Phys. Rev. D 109, L091101 (2024)

arXiv:2311.05955 [pdf, other]

Observation and branching fraction measurement of the decay $J\!/\!ψ\rightarrow \bar{p} Σ^{+} K_{S}^{0} + c.c.$

Authors: M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, J. Bloms, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann , et al. (602 additional authors not shown)

Abstract: The first observation of the decays $J\!/\!ψ\rightarrow \bar{p} Σ^{+} K_{S}^{0}$ and $J\!/\!ψ\rightarrow p \barΣ^{-} K_{S}^{0}$ is reported using $(10087\pm44)\times10^{6}$ $J\!/\!ψ$ events recorded by the BESIII detector at the BEPCII storage ring. The branching fractions of each channel are determined to be… ▽ More The first observation of the decays $J\!/\!ψ\rightarrow \bar{p} Σ^{+} K_{S}^{0}$ and $J\!/\!ψ\rightarrow p \barΣ^{-} K_{S}^{0}$ is reported using $(10087\pm44)\times10^{6}$ $J\!/\!ψ$ events recorded by the BESIII detector at the BEPCII storage ring. The branching fractions of each channel are determined to be $\mathcal{B}(J\!/\!ψ\rightarrow \bar{p} Σ^{+} K_{S}^{0})=(1.361 \pm 0.006 \pm 0.025) \times 10^{-4}$ and $\mathcal{B}(J\!/\!ψ\rightarrow p \barΣ^{-} K_{S}^{0})=(1.352 \pm 0.006 \pm 0.025) \times 10^{-4}$. The combined result is $\mathcal{B}(J\!/\!ψ\rightarrow \bar{p} Σ^{+} K_{S}^{0} +c.c.)=(2.725 \pm 0.009 \pm 0.050) \times 10^{-4}$, where the first uncertainty is statistical and the second systematic. The results presented are in good agreement with the branching fractions of the isospin partner decay $J\!/\!ψ\rightarrow p K^- \barΣ^0 + c.c.$. △ Less

Submitted 14 November, 2023; v1 submitted 10 November, 2023; originally announced November 2023.

arXiv:2311.03687 [pdf, other]

Dissecting the Runtime Performance of the Training, Fine-tuning, and Inference of Large Language Models

Authors: Longteng Zhang, Xiang Liu, Zeyu Li, Xinglin Pan, Peijie Dong, Ruibo Fan, Rui Guo, Xin Wang, Qiong Luo, Shaohuai Shi, Xiaowen Chu

Abstract: Large Language Models (LLMs) have seen great advance in both academia and industry, and their popularity results in numerous open-source frameworks and techniques in accelerating LLM pre-training, fine-tuning, and inference. Training and deploying LLMs are expensive as it requires considerable computing resources and memory, hence many efficient approaches have been developed for improving system… ▽ More Large Language Models (LLMs) have seen great advance in both academia and industry, and their popularity results in numerous open-source frameworks and techniques in accelerating LLM pre-training, fine-tuning, and inference. Training and deploying LLMs are expensive as it requires considerable computing resources and memory, hence many efficient approaches have been developed for improving system pipelines as well as operators. However, the runtime performance can vary significantly across hardware and software stacks, which makes it difficult to choose the best configuration. In this work, we aim to benchmark the performance from both macro and micro perspectives. First, we benchmark the end-to-end performance of pre-training, fine-tuning, and serving LLMs in different sizes , i.e., 7, 13, and 70 billion parameters (7B, 13B, and 70B) on three 8-GPU platforms with and without individual optimization techniques, including ZeRO, quantization, recomputation, FlashAttention. Then, we dive deeper to provide a detailed runtime analysis of the sub-modules, including computing and communication operators in LLMs. For end users, our benchmark and findings help better understand different optimization techniques, training and inference frameworks, together with hardware platforms in choosing configurations for deploying LLMs. For researchers, our in-depth module-wise analyses discover potential opportunities for future work to further optimize the runtime performance of LLMs. △ Less

Submitted 1 December, 2023; v1 submitted 6 November, 2023; originally announced November 2023.

arXiv:2311.02347 [pdf, other]

doi 10.1103/PhysRevD.109.052001

Measurement of the absolute branching fraction of the three-body decay $Λ_{c}^+ \to Ξ^{0}K^{+}π^{0}$ and search for $Λ_{c}^+ \to nK^+π^0$, $Σ^{0}K^{+}π^{0}$ and $ΛK^{+}π^{0}$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (600 additional authors not shown)

Abstract: The Cabbibo-favored decay $Λ_{c}^+ \to Ξ^{0}K^{+}π^{0}$ is studied for the first time using 6.1 fb$^{-1}$ of $e^+e^-$ collision data at center-of-mass energies between 4.600 and 4.840 GeV, collected with the BESIII detector at the BEPCII collider. With a double-tag method, the branching fraction of the three-body decay $Λ_{c}^+ \to Ξ^{0}K^{+}π^{0}$ is measured to be… ▽ More The Cabbibo-favored decay $Λ_{c}^+ \to Ξ^{0}K^{+}π^{0}$ is studied for the first time using 6.1 fb$^{-1}$ of $e^+e^-$ collision data at center-of-mass energies between 4.600 and 4.840 GeV, collected with the BESIII detector at the BEPCII collider. With a double-tag method, the branching fraction of the three-body decay $Λ_{c}^+ \to Ξ^{0}K^{+}π^{0}$ is measured to be $(7.79 \pm 1.46 _{\rm} \pm0.71 _{\rm}) \times 10^{ - 3}$, where the first and second uncertainties are statistical and systematic, respectively. The branching fraction of the two-body decay $Λ_{c}^+ \to Ξ(1530)^{0}K^+$ is $(5.99\pm1.04\pm0.29)\times10^{-3}$, which is consistent with the previous result of $(5.02\pm0.99\pm0.31)\times 10^{-3}$. In addition, the upper limit on the branching fraction of the doubly Cabbibo-suppressed decay $Λ_{c}^+ \to nK^+π^0$ is $7.1 \times 10^{-4}$ at the 90$\%$ confidence level. The upper limits on the branching fractions of $Λ_{c}^+ \to Σ^{0}K^{+}π^{0}$ and $ΛK^{+}π^{0}$ are also determined to be $1.8\times 10^{-3}$ and $ 2.0 \times 10^{-3}$, respectively. △ Less

Submitted 8 May, 2024; v1 submitted 4 November, 2023; originally announced November 2023.

Comments: 15 pages, 20 figures

Journal ref: Phys. Rev. D 109, 052001 (2024)

arXiv:2311.01076 [pdf, other]

Search for a muonphilic scalar $X_{0}$ or vector $X_{1}$ via $J/ψ\toμ^+μ^-+\rm{invisible}$ decays at BESII

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (608 additional authors not shown)

Abstract: A light scalar $X_{0}$ or vector $X_{1}$ particles have been introduced as a possible explanation for the $(g-2)_μ$ anomaly and dark matter phenomena. Using $(8.998\pm 0.039)\times10^9$ $\jpsi $ events collected by the BESIII detector, we search for a light muon philic scalar $X_{0}$ or vector $X_{1}$ in the processes $J/ψ\toμ^+μ^- X_{0,1}$ with $X_{0,1}$ invisible decays. No obvious signal is f… ▽ More A light scalar $X_{0}$ or vector $X_{1}$ particles have been introduced as a possible explanation for the $(g-2)_μ$ anomaly and dark matter phenomena. Using $(8.998\pm 0.039)\times10^9$ $\jpsi $ events collected by the BESIII detector, we search for a light muon philic scalar $X_{0}$ or vector $X_{1}$ in the processes $J/ψ\toμ^+μ^- X_{0,1}$ with $X_{0,1}$ invisible decays. No obvious signal is found, and the upper limits on the coupling $g_{0,1}'$ between the muon and the $X_{0,1}$ particles are set to be between $1.1\times10^{-3}$ and $1.0\times10^{-2}$ for the $X_{0,1}$ mass in the range of $1<M(X_{0,1})<1000$ MeV$/c^2$ at 90$\%$ confidence level. △ Less

Submitted 18 February, 2024; v1 submitted 2 November, 2023; originally announced November 2023.

Comments: 9 pages 7 figures

arXiv:2311.00286 [pdf, other]

JADE: A Linguistics-based Safety Evaluation Platform for Large Language Models

Authors: Mi Zhang, Xudong Pan, Min Yang

Abstract: In this paper, we present JADE, a targeted linguistic fuzzing platform which strengthens the linguistic complexity of seed questions to simultaneously and consistently break a wide range of widely-used LLMs categorized in three groups: eight open-sourced Chinese, six commercial Chinese and four commercial English LLMs. JADE generates three safety benchmarks for the three groups of LLMs, which cont… ▽ More In this paper, we present JADE, a targeted linguistic fuzzing platform which strengthens the linguistic complexity of seed questions to simultaneously and consistently break a wide range of widely-used LLMs categorized in three groups: eight open-sourced Chinese, six commercial Chinese and four commercial English LLMs. JADE generates three safety benchmarks for the three groups of LLMs, which contain unsafe questions that are highly threatening: the questions simultaneously trigger harmful generation of multiple LLMs, with an average unsafe generation ratio of $70\%$ (please see the table below), while are still natural questions, fluent and preserving the core unsafe semantics. We release the benchmark demos generated for commercial English LLMs and open-sourced English LLMs in the following link: https://github.com/whitzard-ai/jade-db. For readers who are interested in evaluating on more questions generated by JADE, please contact us. JADE is based on Noam Chomsky's seminal theory of transformational-generative grammar. Given a seed question with unsafe intention, JADE invokes a sequence of generative and transformational rules to increment the complexity of the syntactic structure of the original question, until the safety guardrail is broken. Our key insight is: Due to the complexity of human language, most of the current best LLMs can hardly recognize the invariant evil from the infinite number of different syntactic structures which form an unbound example space that can never be fully covered. Technically, the generative/transformative rules are constructed by native speakers of the languages, and, once developed, can be used to automatically grow and transform the parse tree of a given question, until the guardrail is broken. For more evaluation results and demo, please check our website: https://whitzard-ai.github.io/jade.html. △ Less

Submitted 10 December, 2023; v1 submitted 1 November, 2023; originally announced November 2023.

Comments: A preprint work. Benchmark link: https://github.com/whitzard-ai/jade-db. Website link: https://whitzard-ai.github.io/jade.html

arXiv:2310.19852 [pdf, other]

AI Alignment: A Comprehensive Survey

Authors: Jiaming Ji, Tianyi Qiu, Boyuan Chen, Borong Zhang, Hantao Lou, Kaile Wang, Yawen Duan, Zhonghao He, Jiayi Zhou, Zhaowei Zhang, Fanzhi Zeng, Kwan Yee Ng, Juntao Dai, Xuehai Pan, Aidan O'Gara, Yingshan Lei, Hua Xu, Brian Tse, Jie Fu, Stephen McAleer, Yaodong Yang, Yizhou Wang, Song-Chun Zhu, Yike Guo, Wen Gao

Abstract: AI alignment aims to make AI systems behave in line with human intentions and values. As AI systems grow more capable, so do risks from misalignment. To provide a comprehensive and up-to-date overview of the alignment field, in this survey, we delve into the core concepts, methodology, and practice of alignment. First, we identify four principles as the key objectives of AI alignment: Robustness,… ▽ More AI alignment aims to make AI systems behave in line with human intentions and values. As AI systems grow more capable, so do risks from misalignment. To provide a comprehensive and up-to-date overview of the alignment field, in this survey, we delve into the core concepts, methodology, and practice of alignment. First, we identify four principles as the key objectives of AI alignment: Robustness, Interpretability, Controllability, and Ethicality (RICE). Guided by these four principles, we outline the landscape of current alignment research and decompose them into two key components: forward alignment and backward alignment. The former aims to make AI systems aligned via alignment training, while the latter aims to gain evidence about the systems' alignment and govern them appropriately to avoid exacerbating misalignment risks. On forward alignment, we discuss techniques for learning from feedback and learning under distribution shift. On backward alignment, we discuss assurance techniques and governance practices. We also release and continually update the website (www.alignmentsurvey.com) which features tutorials, collections of papers, blog posts, and other resources. △ Less

Submitted 1 May, 2024; v1 submitted 30 October, 2023; originally announced October 2023.

Comments: Continually updated, including weak-to-strong generalization and socio-technical thinking. 58 pages (excluding bibliography), 801 references

arXiv:2310.19509 [pdf, other]

SparseByteNN: A Novel Mobile Inference Acceleration Framework Based on Fine-Grained Group Sparsity

Authors: Haitao Xu, Songwei Liu, Yuyang Xu, Shuai Wang, Jiashi Li, Chenqian Yan, Liangqiang Li, Lean Fu, Xin Pan, Fangmin Chen

Abstract: To address the challenge of increasing network size, researchers have developed sparse models through network pruning. However, maintaining model accuracy while achieving significant speedups on general computing devices remains an open problem. In this paper, we present a novel mobile inference acceleration framework SparseByteNN, which leverages fine-grained kernel sparsity to achieve real-time… ▽ More To address the challenge of increasing network size, researchers have developed sparse models through network pruning. However, maintaining model accuracy while achieving significant speedups on general computing devices remains an open problem. In this paper, we present a novel mobile inference acceleration framework SparseByteNN, which leverages fine-grained kernel sparsity to achieve real-time execution as well as high accuracy. Our framework consists of two parts: (a) A fine-grained kernel sparsity schema with a sparsity granularity between structured pruning and unstructured pruning. It designs multiple sparse patterns for different operators. Combined with our proposed whole network rearrangement strategy, the schema achieves a high compression rate and high precision at the same time. (b) Inference engine co-optimized with the sparse pattern. The conventional wisdom is that this reduction in theoretical FLOPs does not translate into real-world efficiency gains. We aim to correct this misconception by introducing a family of efficient sparse kernels for ARM and WebAssembly. Equipped with our efficient implementation of sparse primitives, we show that sparse versions of MobileNet-v1 outperform strong dense baselines on the efficiency-accuracy curve. Experimental results on Qualcomm 855 show that for 30% sparse MobileNet-v1, SparseByteNN achieves 1.27x speedup over the dense version and 1.29x speedup over the state-of-the-art sparse inference engine MNN with a slight accuracy drop of 0.224%. The source code of SparseByteNN will be available at https://github.com/lswzjuer/SparseByteNN △ Less

Submitted 30 October, 2023; originally announced October 2023.

arXiv:2310.19277 [pdf, other]

Clustering based Multiple Anchors High-Dimensional Model Representation

Authors: Meixin Xiong, Liuhong Chen, Ju Ming, Xingchen Pan, Xinyu Yan

Abstract: In this work, a cut high-dimensional model representation (cut-HDMR) expansion based on multiple anchors is constructed via the clustering method. Specifically, a set of random input realizations is drawn from the parameter space and grouped by the centroidal Voronoi tessellation (CVT) method. Then for each cluster, the centroid is set as the reference, thereby the corresponding zeroth-order term… ▽ More In this work, a cut high-dimensional model representation (cut-HDMR) expansion based on multiple anchors is constructed via the clustering method. Specifically, a set of random input realizations is drawn from the parameter space and grouped by the centroidal Voronoi tessellation (CVT) method. Then for each cluster, the centroid is set as the reference, thereby the corresponding zeroth-order term can be determined directly. While for non-zero order terms of each cut-HDMR, a set of discrete points is selected for each input component, and the Lagrange interpolation method is applied. For a new input, the cut-HDMR corresponding to the nearest centroid is used to compute its response. Numerical experiments with high-dimensional integral and elliptic stochastic partial differential equation as backgrounds show that the CVT based multiple anchors cut-HDMR can alleviate the negative impact of a single inappropriate anchor point, and has higher accuracy than the average of several expansions. △ Less

Submitted 10 November, 2023; v1 submitted 30 October, 2023; originally announced October 2023.

arXiv:2310.19210 [pdf, other]

Generalized Category Discovery with Clustering Assignment Consistency

Authors: Xiangli Yang, Xinglin Pan, Irwin King, Zenglin Xu

Abstract: Generalized category discovery (GCD) is a recently proposed open-world task. Given a set of images consisting of labeled and unlabeled instances, the goal of GCD is to automatically cluster the unlabeled samples using information transferred from the labeled dataset. The unlabeled dataset comprises both known and novel classes. The main challenge is that unlabeled novel class samples and unlabeled… ▽ More Generalized category discovery (GCD) is a recently proposed open-world task. Given a set of images consisting of labeled and unlabeled instances, the goal of GCD is to automatically cluster the unlabeled samples using information transferred from the labeled dataset. The unlabeled dataset comprises both known and novel classes. The main challenge is that unlabeled novel class samples and unlabeled known class samples are mixed together in the unlabeled dataset. To address the GCD without knowing the class number of unlabeled dataset, we propose a co-training-based framework that encourages clustering consistency. Specifically, we first introduce weak and strong augmentation transformations to generate two sufficiently different views for the same sample. Then, based on the co-training assumption, we propose a consistency representation learning strategy, which encourages consistency between feature-prototype similarity and clustering assignment. Finally, we use the discriminative embeddings learned from the semi-supervised representation learning process to construct an original sparse network and use a community detection method to obtain the clustering results and the number of categories simultaneously. Extensive experiments show that our method achieves state-of-the-art performance on three generic benchmarks and three fine-grained visual recognition datasets. Especially in the ImageNet-100 data set, our method significantly exceeds the best baseline by 15.5\% and 7.0\% on the \texttt{Novel} and \texttt{All} classes, respectively. △ Less

Submitted 29 October, 2023; originally announced October 2023.

Comments: ICONIP 2023,This paper has been nominated for ICONIP2023 Best Paper Award

arXiv:2310.17937 [pdf, ps, other]

doi 10.1103/PhysRevLett.132.151901

Observation of the Anomalous Shape of $X(1840)$ in $J/ψ\rightarrow γ3(π^+ π^-)$ Indicating a Second Resonance Near $p\bar{p}$ Threshold

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann , et al. (604 additional authors not shown)

Abstract: Using a sample of $(10087\pm44)\times 10^6$ $J/ψ$ events, which is about 45 times larger than that was previously analyzed, a further investigation on the $J/ψ\rightarrow γ3(π^+π^-)$ decay is performed. A significant distortion at 1.84 GeV/$c^2$ in the line-shape of the $3(π^+π^-)$ invariant mass spectrum is observed for the first time, which could be resolved by two overlap** resonant structure… ▽ More Using a sample of $(10087\pm44)\times 10^6$ $J/ψ$ events, which is about 45 times larger than that was previously analyzed, a further investigation on the $J/ψ\rightarrow γ3(π^+π^-)$ decay is performed. A significant distortion at 1.84 GeV/$c^2$ in the line-shape of the $3(π^+π^-)$ invariant mass spectrum is observed for the first time, which could be resolved by two overlap** resonant structures, $X(1840)$ and $X(1880)$. The new state $X(1880)$ is observed with a statistical significance larger than $10σ$. The mass and width of $X(1880)$ are determined to be $1882.1\pm1.7\pm0.7$ MeV/$c^2$ and $30.7\pm5.5 \pm2.4$ MeV, respectively, which indicates the existence of a $p\bar{p}$ bound state. △ Less

Submitted 15 April, 2024; v1 submitted 27 October, 2023; originally announced October 2023.

Journal ref: Phys. Rev. Lett. 132, 151901 (2024)

arXiv:2310.15601 [pdf, ps, other]

doi 10.1103/PhysRevD.109.032011

Study of the doubly Cabibbo-suppressed decays $D^+_s\to K^+K^+π^-$ and $D^+_s\to K^+K^+π^-π^0$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko , et al. (604 additional authors not shown)

Abstract: Based on 7.33 fb$^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies between 4.128 and 4.226 GeV with the BESIII detector, the experimental studies of the doubly Cabibbo-suppressed decays $D^+_s\to K^+K^+π^-$ and $D^+_s\to K^+K^+π^-π^0$ are reported. We determine the absolute branching fraction of $D^+_s\to K^+K^+π^-$ to be (… ▽ More Based on 7.33 fb$^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies between 4.128 and 4.226 GeV with the BESIII detector, the experimental studies of the doubly Cabibbo-suppressed decays $D^+_s\to K^+K^+π^-$ and $D^+_s\to K^+K^+π^-π^0$ are reported. We determine the absolute branching fraction of $D^+_s\to K^+K^+π^-$ to be (${1.23^{+0.28}_{-0.25}}({\rm stat})\pm0.06({\rm syst})$) $\times 10^{-4}$. No significant signal of $D^+_s\to K^+K^+π^-π^0$ is observed and the upper limit on its decay branching fraction at 90\% confidence level is set to be $1.7\times10^{-4}$. △ Less

Submitted 24 October, 2023; originally announced October 2023.

Comments: 10 pages, 4 figures, 4 tables

Report number: BAM-00695

Journal ref: Phys. Rev. D 109, 032011 (2024)

arXiv:2310.15072 [pdf, other]

doi 10.1109/TVCG.2024.3353263

RD-VIO: Robust Visual-Inertial Odometry for Mobile Augmented Reality in Dynamic Environments

Authors: **yu Li, Xiaokun Pan, Gan Huang, Ziyang Zhang, Nan Wang, Hujun Bao, Guofeng Zhang

Abstract: It is typically challenging for visual or visual-inertial odometry systems to handle the problems of dynamic scenes and pure rotation. In this work, we design a novel visual-inertial odometry (VIO) system called RD-VIO to handle both of these two problems. Firstly, we propose an IMU-PARSAC algorithm which can robustly detect and match keypoints in a two-stage process. In the first state, landmarks… ▽ More It is typically challenging for visual or visual-inertial odometry systems to handle the problems of dynamic scenes and pure rotation. In this work, we design a novel visual-inertial odometry (VIO) system called RD-VIO to handle both of these two problems. Firstly, we propose an IMU-PARSAC algorithm which can robustly detect and match keypoints in a two-stage process. In the first state, landmarks are matched with new keypoints using visual and IMU measurements. We collect statistical information from the matching and then guide the intra-keypoint matching in the second stage. Secondly, to handle the problem of pure rotation, we detect the motion type and adapt the deferred-triangulation technique during the data-association process. We make the pure-rotational frames into the special subframes. When solving the visual-inertial bundle adjustment, they provide additional constraints to the pure-rotational motion. We evaluate the proposed VIO system on public datasets and online comparison. Experiments show the proposed RD-VIO has obvious advantages over other methods in dynamic environments. The source code is available at: \href{https://github.com/openxrlab/xrslam}{{\fontfamily{pcr}\selectfont https://github.com/openxrlab/xrslam}}. △ Less

Submitted 16 February, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

Journal ref: IEEE Transactions on Visualization and Computer Graphics, 2024

arXiv:2310.14585 [pdf, other]

doi 10.1103/PhysRevD.108.092011

Observation of the $ψ(3686)$ decays into $Σ^{+}\barΣ^{-}ω$ and $Σ^{+}\barΣ^{-}{\mathcalφ}$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann , et al. (604 additional authors not shown)

Abstract: Based on $(27.08\pm 0.14)\times10^{8}$ $ψ(3686)$ events collected with the BESIII detector operating at the BEPCII collider, the $ψ(3686)\toΣ^{+}\barΣ^{-}ω$ and $Σ^{+}\barΣ^{-}φ$ decays are observed for the first time with statistical significances of 13.8$σ$ and 7.6$σ$, respectively. The corresponding branching fractions are measured to be… ▽ More Based on $(27.08\pm 0.14)\times10^{8}$ $ψ(3686)$ events collected with the BESIII detector operating at the BEPCII collider, the $ψ(3686)\toΣ^{+}\barΣ^{-}ω$ and $Σ^{+}\barΣ^{-}φ$ decays are observed for the first time with statistical significances of 13.8$σ$ and 7.6$σ$, respectively. The corresponding branching fractions are measured to be $\mathcal{B}(ψ(3686)\toΣ^{+}\barΣ^{-}ω)=(1.90 \pm 0.18 \pm 0.21) \times 10^{-5}$ and $\mathcal{B}(ψ(3686)\toΣ^{+}\barΣ^{-}φ)=(2.96 \pm 0.54 \pm 0.41) \times 10^{-6}$, where the first uncertainties are statistical and the second systematic. △ Less

Submitted 23 October, 2023; originally announced October 2023.

Comments: 10 pages

Journal ref: Phys. Rev. D 108, 092011 (2023)

arXiv:2310.13250 [pdf, other]

Diagnosis-oriented Medical Image Compression with Efficient Transfer Learning

Authors: Guangqi Xie, Xin Li, Xiaohan Pan, Zhibo Chen

Abstract: Remote medical diagnosis has emerged as a critical and indispensable technique in practical medical systems, where medical data are required to be efficiently compressed and transmitted for diagnosis by either professional doctors or intelligent diagnosis devices. In this process, a large amount of redundant content irrelevant to the diagnosis is subjected to high-fidelity coding, leading to unnec… ▽ More Remote medical diagnosis has emerged as a critical and indispensable technique in practical medical systems, where medical data are required to be efficiently compressed and transmitted for diagnosis by either professional doctors or intelligent diagnosis devices. In this process, a large amount of redundant content irrelevant to the diagnosis is subjected to high-fidelity coding, leading to unnecessary transmission costs. To mitigate this, we propose diagnosis-oriented medical image compression, a special semantic compression task designed for medical scenarios, targeting to reduce the compression cost without compromising the diagnosis accuracy. However, collecting sufficient medical data to optimize such a compression system is significantly expensive and challenging due to privacy issues and the lack of professional annotation. In this study, we propose DMIC, the first efficient transfer learning-based codec, for diagnosis-oriented medical image compression, which can be effectively optimized with only few-shot annotated medical examples, by reusing the knowledge in the existing reinforcement learning-based task-driven semantic coding framework, i.e., HRLVSC [1]. Concretely, we focus on tuning only the partial parameters of the policy network for bit allocation within HRLVSC, which enables it to adapt to the medical images. In this work, we validate our DMIC with the typical medical task, Coronary Artery Segmentation. Extensive experiments have demonstrated that our DMIC can achieve 47.594%BD-Rate savings compared to the HEVC anchor, by tuning only the A2C module (2.7% parameters) of the policy network with only 1 medical sample. △ Less

Submitted 19 October, 2023; originally announced October 2023.

Comments: Accepted by IEEE VCIP

arXiv:2310.12773 [pdf, other]

Safe RLHF: Safe Reinforcement Learning from Human Feedback

Authors: Josef Dai, Xuehai Pan, Ruiyang Sun, Jiaming Ji, Xinbo Xu, Mickel Liu, Yizhou Wang, Yaodong Yang

Abstract: With the development of large language models (LLMs), striking a balance between the performance and safety of AI systems has never been more critical. However, the inherent tension between the objectives of helpfulness and harmlessness presents a significant challenge during LLM training. To address this issue, we propose Safe Reinforcement Learning from Human Feedback (Safe RLHF), a novel algori… ▽ More With the development of large language models (LLMs), striking a balance between the performance and safety of AI systems has never been more critical. However, the inherent tension between the objectives of helpfulness and harmlessness presents a significant challenge during LLM training. To address this issue, we propose Safe Reinforcement Learning from Human Feedback (Safe RLHF), a novel algorithm for human value alignment. Safe RLHF explicitly decouples human preferences regarding helpfulness and harmlessness, effectively avoiding the crowdworkers' confusion about the tension and allowing us to train separate reward and cost models. We formalize the safety concern of LLMs as an optimization task of maximizing the reward function while satisfying specified cost constraints. Leveraging the Lagrangian method to solve this constrained problem, Safe RLHF dynamically adjusts the balance between the two objectives during fine-tuning. Through a three-round fine-tuning using Safe RLHF, we demonstrate a superior ability to mitigate harmful responses while enhancing model performance compared to existing value-aligned algorithms. Experimentally, we fine-tuned the Alpaca-7B using Safe RLHF and aligned it with collected human preferences, significantly improving its helpfulness and harmlessness according to human evaluations. △ Less

Submitted 19 October, 2023; originally announced October 2023.

arXiv:2310.12670 [pdf, other]

Reliable and Efficient In-Memory Fault Tolerance of Large Language Model Pretraining

Authors: Yuxin Wang, Shaohuai Shi, Xin He, Zhenheng Tang, Xinglin Pan, Yang Zheng, Xiaoyu Wu, Amelie Chi Zhou, Bingsheng He, Xiaowen Chu

Abstract: Extensive system scales (i.e. thousands of GPU/TPUs) and prolonged training periods (i.e. months of pretraining) significantly escalate the probability of failures when training large language models (LLMs). Thus, efficient and reliable fault-tolerance methods are in urgent need. Checkpointing is the primary fault-tolerance method to periodically save parameter snapshots from GPU memory to disks v… ▽ More Extensive system scales (i.e. thousands of GPU/TPUs) and prolonged training periods (i.e. months of pretraining) significantly escalate the probability of failures when training large language models (LLMs). Thus, efficient and reliable fault-tolerance methods are in urgent need. Checkpointing is the primary fault-tolerance method to periodically save parameter snapshots from GPU memory to disks via CPU memory. In this paper, we identify the frequency of existing checkpoint-based fault-tolerance being significantly limited by the storage I/O overheads, which results in hefty re-training costs on restarting from the nearest checkpoint. In response to this gap, we introduce an in-memory fault-tolerance framework for large-scale LLM pretraining. The framework boosts the efficiency and reliability of fault tolerance from three aspects: (1) Reduced Data Transfer and I/O: By asynchronously caching parameters, i.e., sharded model parameters, optimizer states, and RNG states, to CPU volatile memory, Our framework significantly reduces communication costs and bypasses checkpoint I/O. (2) Enhanced System Reliability: Our framework enhances parameter protection with a two-layer hierarchy: snapshot management processes (SMPs) safeguard against software failures, together with Erasure Coding (EC) protecting against node failures. This double-layered protection greatly improves the survival probability of the parameters compared to existing checkpointing methods. (3) Improved Snapshotting Frequency: Our framework achieves more frequent snapshotting compared with asynchronous checkpointing optimizations under the same saving time budget, which improves the fault tolerance efficiency. Empirical results demonstrate that Our framework minimizes the overhead of fault tolerance of LLM pretraining by effectively leveraging redundant CPU resources. △ Less

Submitted 19 October, 2023; originally announced October 2023.

Comments: Fault Tolerance, Checkpoint Optimization, Large Language Model, 3D parallelism

arXiv:2310.12567 [pdf, other]

Safety-Gymnasium: A Unified Safe Reinforcement Learning Benchmark

Authors: Jiaming Ji, Borong Zhang, Jiayi Zhou, Xuehai Pan, Weidong Huang, Ruiyang Sun, Yiran Geng, Yifan Zhong, Juntao Dai, Yaodong Yang

Abstract: Artificial intelligence (AI) systems possess significant potential to drive societal progress. However, their deployment often faces obstacles due to substantial safety concerns. Safe reinforcement learning (SafeRL) emerges as a solution to optimize policies while simultaneously adhering to multiple constraints, thereby addressing the challenge of integrating reinforcement learning in safety-criti… ▽ More Artificial intelligence (AI) systems possess significant potential to drive societal progress. However, their deployment often faces obstacles due to substantial safety concerns. Safe reinforcement learning (SafeRL) emerges as a solution to optimize policies while simultaneously adhering to multiple constraints, thereby addressing the challenge of integrating reinforcement learning in safety-critical scenarios. In this paper, we present an environment suite called Safety-Gymnasium, which encompasses safety-critical tasks in both single and multi-agent scenarios, accepting vector and vision-only input. Additionally, we offer a library of algorithms named Safe Policy Optimization (SafePO), comprising 16 state-of-the-art SafeRL algorithms. This comprehensive library can serve as a validation tool for the research community. By introducing this benchmark, we aim to facilitate the evaluation and comparison of safety performance, thus fostering the development of reinforcement learning for safer, more reliable, and responsible real-world applications. The website of this project can be accessed at https://sites.google.com/view/safety-gymnasium. △ Less

Submitted 6 November, 2023; v1 submitted 19 October, 2023; originally announced October 2023.

Comments: Published at NeurIPS 2023

arXiv:2310.10452 [pdf, other]

doi 10.1103/PhysRevD.108.L111101

Measurement of the cross sections for $e^+e^-\toηπ^+π^-$ at center-of-mass energies between 2.00 and 3.08 GeV

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (605 additional authors not shown)

Abstract: Using data samples collected at center-of-mass energies between 2.000 and 3.080 GeV with the BESIII detector operating at the BEPCII collider, a partial-wave analysis is performed on the process $e^+e^-\toηπ^+π^-$. In addition to the dominant $e^+e^-\toρη$ component, the $e^+e^-\to a_2(1320)π$ process is also sizeable, contributing up to 24% of the total reaction. The measured cross sections of th… ▽ More Using data samples collected at center-of-mass energies between 2.000 and 3.080 GeV with the BESIII detector operating at the BEPCII collider, a partial-wave analysis is performed on the process $e^+e^-\toηπ^+π^-$. In addition to the dominant $e^+e^-\toρη$ component, the $e^+e^-\to a_2(1320)π$ process is also sizeable, contributing up to 24% of the total reaction. The measured cross sections of the process $e^+e^-\toηπ^+π^-$ are systematically higher than those of BaBar by more than $3σ$ at center-of-mass energies between 2.000 and 2.300 GeV. In the cross section lineshape for $e^+e^-\to a_2(1320)π$, a resonant structure is observed with a significance of $5.5σ$, with $M=(2044\pm31\pm4)$ MeV/$c^2$, $Γ=(163\pm69\pm24)$ MeV and $\mathcal{B_{R}}\cdotΓ_{e^+e^-}^{R}=(34.6\pm17.1\pm6.0)$ eV or $(137.1\pm73.3\pm2.1)$ eV. In the cross section lineshape for $e^+e^-\toρη$, an evidence of a dip structure around 2180 MeV/$c^2$ is observed with statistical significance of $3.0σ$. △ Less

Submitted 28 November, 2023; v1 submitted 16 October, 2023; originally announced October 2023.

Journal ref: PHYS. REV. D 108, L111101 (2023)

arXiv:2310.08710 [pdf, other]

Waymax: An Accelerated, Data-Driven Simulator for Large-Scale Autonomous Driving Research

Authors: Cole Gulino, Justin Fu, Wenjie Luo, George Tucker, Eli Bronstein, Yiren Lu, Jean Harb, Xinlei Pan, Yan Wang, Xiangyu Chen, John D. Co-Reyes, Rishabh Agarwal, Rebecca Roelofs, Yao Lu, Nico Montali, Paul Mougin, Zoey Yang, Brandyn White, Aleksandra Faust, Rowan McAllister, Dragomir Anguelov, Benjamin Sapp

Abstract: Simulation is an essential tool to develop and benchmark autonomous vehicle planning software in a safe and cost-effective manner. However, realistic simulation requires accurate modeling of nuanced and complex multi-agent interactive behaviors. To address these challenges, we introduce Waymax, a new data-driven simulator for autonomous driving in multi-agent scenes, designed for large-scale simul… ▽ More Simulation is an essential tool to develop and benchmark autonomous vehicle planning software in a safe and cost-effective manner. However, realistic simulation requires accurate modeling of nuanced and complex multi-agent interactive behaviors. To address these challenges, we introduce Waymax, a new data-driven simulator for autonomous driving in multi-agent scenes, designed for large-scale simulation and testing. Waymax uses publicly-released, real-world driving data (e.g., the Waymo Open Motion Dataset) to initialize or play back a diverse set of multi-agent simulated scenarios. It runs entirely on hardware accelerators such as TPUs/GPUs and supports in-graph simulation for training, making it suitable for modern large-scale, distributed machine learning workflows. To support online training and evaluation, Waymax includes several learned and hard-coded behavior models that allow for realistic interaction within simulation. To supplement Waymax, we benchmark a suite of popular imitation and reinforcement learning algorithms with ablation studies on different design decisions, where we highlight the effectiveness of routes as guidance for planning agents and the ability of RL to overfit against simulated agents. △ Less

Submitted 12 October, 2023; originally announced October 2023.

arXiv:2310.07577 [pdf]

Impact of resource availability and conformity effect on sustainability of common-pool resources

Authors: Chengyi Tu, Renfei Chen, Ying Fan, Xuwei Pan

Abstract: Sustainability of common-pool resources hinges on the interplay between human and environmental systems. However, there is still a lack of a novel and comprehensive framework for modelling extraction of common-pool resources and cooperation of human agents that can account for different factors that shape the system behavior and outcomes. In particular, we still lack a critical value for ensuring… ▽ More Sustainability of common-pool resources hinges on the interplay between human and environmental systems. However, there is still a lack of a novel and comprehensive framework for modelling extraction of common-pool resources and cooperation of human agents that can account for different factors that shape the system behavior and outcomes. In particular, we still lack a critical value for ensuring resource sustainability under different scenarios. In this paper, we present a novel framework for studying resource extraction and cooperation in human-environmental systems for common-pool resources. We explore how different factors, such as resource availability and conformity effect, influence the players' decisions and the resource outcomes. We identify critical values for ensuring resource sustainability under various scenarios. We demonstrate the observed phenomena are robust to the complexity and assumptions of the models and discuss implications of our study for policy and practice, as well as the limitations and directions for future research. △ Less

Submitted 17 October, 2023; v1 submitted 11 October, 2023; originally announced October 2023.

arXiv:2310.07277 [pdf, other]

Search for $J/ψ$ weak decays containing $D$ meson

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann , et al. (600 additional authors not shown)

Abstract: Using a sample of about 10 billion $J/ψ$ events with the BESIII detector, we search for the weak decays of $J/ψ\to \bar{D}^0π^0 + c.c.$, $J/ψ\to \bar{D}^0η+ c.c.$, $J/ψ\to \bar{D}^0ρ^0 + c.c.$, $J/ψ\to D^-π^+ + c.c.$, and $J/ψ\to D^-ρ^+ + c.c.$. Since no significant signal is observed, we set the upper limits of the branching fractions of these decays to be… ▽ More Using a sample of about 10 billion $J/ψ$ events with the BESIII detector, we search for the weak decays of $J/ψ\to \bar{D}^0π^0 + c.c.$, $J/ψ\to \bar{D}^0η+ c.c.$, $J/ψ\to \bar{D}^0ρ^0 + c.c.$, $J/ψ\to D^-π^+ + c.c.$, and $J/ψ\to D^-ρ^+ + c.c.$. Since no significant signal is observed, we set the upper limits of the branching fractions of these decays to be $\mathcal{B}(J/ψ\to \bar{D}^0π^0 + c.c.) < 4.7 \times 10^{-7}$, $\mathcal{B}(J/ψ\to \bar{D}^0η+ c.c.) < 6.8 \times 10^{-7}$, $\mathcal{B}(J/ψ\to \bar{D}^0ρ^0 + c.c.) < 5.2 \times 10^{-7}$, $\mathcal{B}(J/ψ\to D^-π^+ + c.c.) < 7.0 \times 10^{-8}$, and $\mathcal{B}(J/ψ\to D^-ρ^+ + c.c.) < 6.0 \times 10^{-7}$ at the 90\% confidence level. △ Less

Submitted 11 October, 2023; originally announced October 2023.

Comments: 10 pages, 17 figures

arXiv:2310.05556 [pdf, other]

WeatherDepth: Curriculum Contrastive Learning for Self-Supervised Depth Estimation under Adverse Weather Conditions

Authors: Jiyuan Wang, Chunyu Lin, Lang Nie, Shujun Huang, Yao Zhao, Xing Pan, Rui Ai

Abstract: Depth estimation models have shown promising performance on clear scenes but fail to generalize to adverse weather conditions due to illumination variations, weather particles, etc. In this paper, we propose WeatherDepth, a self-supervised robust depth estimation model with curriculum contrastive learning, to tackle performance degradation in complex weather conditions. Concretely, we first presen… ▽ More Depth estimation models have shown promising performance on clear scenes but fail to generalize to adverse weather conditions due to illumination variations, weather particles, etc. In this paper, we propose WeatherDepth, a self-supervised robust depth estimation model with curriculum contrastive learning, to tackle performance degradation in complex weather conditions. Concretely, we first present a progressive curriculum learning scheme with three simple-to-complex curricula to gradually adapt the model from clear to relative adverse, and then to adverse weather scenes. It encourages the model to gradually grasp beneficial depth cues against the weather effect, yielding smoother and better domain adaption. Meanwhile, to prevent the model from forgetting previous curricula, we integrate contrastive learning into different curricula. By drawing reference knowledge from the previous course, our strategy establishes a depth consistency constraint between different courses toward robust depth estimation in diverse weather. Besides, to reduce manual intervention and better adapt to different models, we designed an adaptive curriculum scheduler to automatically search for the best timing for course switching. In the experiment, the proposed solution is proven to be easily incorporated into various architectures and demonstrates state-of-the-art (SoTA) performance on both synthetic and real weather datasets. Source code and data are available at \url{https://github.com/wangjiyuan9/WeatherDepth}. △ Less

Submitted 17 March, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

Comments: 6 pages, accept by ICRA 2024

Journal ref: ICRA 2024

arXiv:2310.03361 [pdf, other]

doi 10.1103/PhysRevD.109.092012

Measurement of $e^{+}e^{-}\rightarrowηJ/ψ$ Cross Section from $\sqrt{s}=$ 3.808 GeV to 4.951 GeV

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (608 additional authors not shown)

Abstract: Using data samples with an integrated luminosity of 22.42 fb$^{-1}$ collected by the BESIII detector operating at the BEPCII storage ring, we measure the cross sections of the $e^{+}e^{-}\rightarrow\etaJ/ψ$ process at center-of-mass energies from 3.808 to 4.951 GeV. Three structures are observed in the line shape of the measured cross sections. A maximum-likelihood fit with $ψ(4040)$, two addition… ▽ More Using data samples with an integrated luminosity of 22.42 fb$^{-1}$ collected by the BESIII detector operating at the BEPCII storage ring, we measure the cross sections of the $e^{+}e^{-}\rightarrow\etaJ/ψ$ process at center-of-mass energies from 3.808 to 4.951 GeV. Three structures are observed in the line shape of the measured cross sections. A maximum-likelihood fit with $ψ(4040)$, two additional resonances, and a non-resonant component is performed. The mass and width of the first additional state are $(4219.7\pm2.5\pm4.5) \rm{MeV}/\rm{c}^2$ and $(80.7\pm4.4\pm1.4) \rm{MeV}$, respectively, consistent with the $ψ(4230)$. For the second state, the mass and width are $(4386\pm13\pm17) \rm{MeV}/\rm{c}^2$ and $(177\pm32\pm13) \rm{MeV}$, respectively, consistent with the $ψ(4360)$. The first uncertainties are statistical and the second ones are systematic. The statistical significance of $ψ(4040)$ is $8.0σ$ and those for $ψ(4230)$ and $ψ(4360)$ are more than $10.0σ$. △ Less

Submitted 5 October, 2023; originally announced October 2023.

arXiv:2310.02992 [pdf, other]

Kosmos-G: Generating Images in Context with Multimodal Large Language Models

Authors: Xichen Pan, Li Dong, Shaohan Huang, Zhiliang Peng, Wenhu Chen, Furu Wei

Abstract: Recent advancements in subject-driven image generation have made significant strides. However, current methods still fall short in diverse application scenarios, as they require test-time tuning and cannot accept interleaved multi-image and text input. These limitations keep them far from the ultimate goal of "image as a foreign language in image generation." This paper presents Kosmos-G, a model… ▽ More Recent advancements in subject-driven image generation have made significant strides. However, current methods still fall short in diverse application scenarios, as they require test-time tuning and cannot accept interleaved multi-image and text input. These limitations keep them far from the ultimate goal of "image as a foreign language in image generation." This paper presents Kosmos-G, a model that leverages the advanced multimodal perception capabilities of Multimodal Large Language Models (MLLMs) to tackle the aforementioned challenge. Our approach aligns the output space of MLLM with CLIP using the textual modality as an anchor and performs compositional instruction tuning on curated data. Kosmos-G demonstrates an impressive capability of zero-shot subject-driven generation with interleaved multi-image and text input. Notably, the score distillation instruction tuning requires no modifications to the image decoder. This allows for a seamless substitution of CLIP and effortless integration with a myriad of U-Net techniques ranging from fine-grained controls to personalized image decoder variants. We posit Kosmos-G as an initial attempt towards the goal of "image as a foreign language in image generation." The code can be found at https://aka.ms/Kosmos-G △ Less

Submitted 25 April, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

Comments: Code: https://aka.ms/Kosmos-G Project Page: https://xichenpan.github.io/kosmosg

arXiv:2310.02676 [pdf, other]

PostRainBench: A comprehensive benchmark and a new model for precipitation forecasting

Authors: Yu** Tang, Jiaming Zhou, Xiang Pan, Zeying Gong, Junwei Liang

Abstract: Accurate precipitation forecasting is a vital challenge of both scientific and societal importance. Data-driven approaches have emerged as a widely used solution for addressing this challenge. However, solely relying on data-driven approaches has limitations in modeling the underlying physics, making accurate predictions difficult. Coupling AI-based post-processing techniques with traditional Nume… ▽ More Accurate precipitation forecasting is a vital challenge of both scientific and societal importance. Data-driven approaches have emerged as a widely used solution for addressing this challenge. However, solely relying on data-driven approaches has limitations in modeling the underlying physics, making accurate predictions difficult. Coupling AI-based post-processing techniques with traditional Numerical Weather Prediction (NWP) methods offers a more effective solution for improving forecasting accuracy. Despite previous post-processing efforts, accurately predicting heavy rainfall remains challenging due to the imbalanced precipitation data across locations and complex relationships between multiple meteorological variables. To address these limitations, we introduce the PostRainBench, a comprehensive multi-variable NWP post-processing benchmark consisting of three datasets for NWP post-processing-based precipitation forecasting. We propose CAMT, a simple yet effective Channel Attention Enhanced Multi-task Learning framework with a specially designed weighted loss function. Its flexible design allows for easy plug-and-play integration with various backbones. Extensive experimental results on the proposed benchmark show that our method outperforms state-of-the-art methods by 6.3%, 4.7%, and 26.8% in rain CSI on the three datasets respectively. Most notably, our model is the first deep learning-based method to outperform traditional Numerical Weather Prediction (NWP) approaches in extreme precipitation conditions. It shows improvements of 15.6%, 17.4%, and 31.8% over NWP predictions in heavy rain CSI on respective datasets. These results highlight the potential impact of our model in reducing the severe consequences of extreme weather events. △ Less

Submitted 4 October, 2023; v1 submitted 4 October, 2023; originally announced October 2023.

Comments: 16 pages, 3 figures

arXiv:2310.00720 [pdf, other]

doi 10.1103/PhysRevC.109.L052201

First measurement of $ΛN$ inelastic scattering with $Λ$ from $e^{+} e^{-} \rightarrow J/ψ\to Λ\barΛ$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (626 additional authors not shown)

Abstract: Using an $e^+ e^-$ collision data sample of $(10087 \pm 44)\times10^6 ~J/ψ$ events taken at the center-of-mass energy of $3.097~\rm{GeV}$ by the BESIII detector at the BEPCII collider, the process $Λ+N \rightarrow Σ^+ + X$ is studied for the first time employing a novel method. The $Σ^{+}$ hyperons are produced by the collisions of $Λ$ hyperons from $J/ψ$ decays with nuclei in the material of the… ▽ More Using an $e^+ e^-$ collision data sample of $(10087 \pm 44)\times10^6 ~J/ψ$ events taken at the center-of-mass energy of $3.097~\rm{GeV}$ by the BESIII detector at the BEPCII collider, the process $Λ+N \rightarrow Σ^+ + X$ is studied for the first time employing a novel method. The $Σ^{+}$ hyperons are produced by the collisions of $Λ$ hyperons from $J/ψ$ decays with nuclei in the material of the BESIII detector. The total cross section of $Λ+ ^{9}{\rm Be} \rightarrow Σ^+ + X$ is measured to be $σ= (37.3 \pm 4.7 \pm 3.5)~{\rm mb}$ at $Λ$ beam momenta within $[1.057, 1.091]~{\rm GeV}/c$, where the uncertainties are statistical and systematic, respectively. This analysis is the first study of $Λ$-nucleon interactions at an $e^+ e^-$ collider, providing information and constraints relevant for the strong-interaction potential, the origin of color confinement, the unified model for baryon-baryon interactions, and the internal structure of neutron stars. △ Less

Submitted 1 October, 2023; originally announced October 2023.

arXiv:2310.00543 [pdf]

Elucidating Dynamic Conductive State Changes in Amorphous Lithium Lanthanum Titanate for Resistive Switching Devices

Authors: Ryosuke Shimizu, Diyi Cheng, Guomin Zhu, Bing Han, Thomas S. Marchese, Randall Burger, Mingjie Xu, Xiaoqing Pan, Minghao Zhang, Ying Shirley Meng

Abstract: Exploration of novel resistive switching materials attracts attention to replace conventional Si-based transistors and to achieve neuromorphic computing that can surpass the limit of the current Von-Neumann computing for the time of Internet of Things (IoT). Materials priorly used to serve in batteries have demonstrated metal-insulator transitions upon an electrical biasing due to resulting compos… ▽ More Exploration of novel resistive switching materials attracts attention to replace conventional Si-based transistors and to achieve neuromorphic computing that can surpass the limit of the current Von-Neumann computing for the time of Internet of Things (IoT). Materials priorly used to serve in batteries have demonstrated metal-insulator transitions upon an electrical biasing due to resulting compositional change. This property is desirable for future resistive switching devices. Amorphous lithium lanthanum titanate (a-LLTO) was originally developed as a solid-state electrolyte with relatively high lithium ionic conductivity and low electronic conductivity among oxide-type solid electrolytes. However, it has been suggested that electric conductivity of a-LLTO changes depending on oxygen content. In this work, the investigation of switching behavior of a-LLTO was conducted by employing a range of voltage sweep techniques, ultimately establishing a stable and optimal operating condition within the voltage window of -3.5 V to 3.5 V. This voltage range effectively balances the desirable trait of a substantial resistance change by three orders of magnitude with the imperative avoidance of LLTO decomposition. This switching behavior is also confirmed at nanodevice of Ni/LLTO/Ni through in-situ biasing inside focused-ion beam/scanning electron microscope (FIB-SEM). Experiment and computation with different LLTO composition shows that LLTO has two distinct conductivity states due to Ti reduction. The distribution of these two states is discussed using simplified binary model, implying the conductive filament growth during low resistance state. Consequently, our study deepens understanding of LLTO electronic properties and encourages the interdisciplinary application of battery materials for resistive switching devices. △ Less

Submitted 30 September, 2023; originally announced October 2023.

arXiv:2310.00492 [pdf, other]

From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning

Authors: Xuansheng Wu, Wenlin Yao, Jianshu Chen, Xiaoman Pan, Xiaoyang Wang, Ninghao Liu, Dong Yu

Abstract: Large Language Models (LLMs) have achieved remarkable success, where instruction tuning is the critical step in aligning LLMs with user intentions. In this work, we investigate how the instruction tuning adjusts pre-trained models with a focus on intrinsic changes. Specifically, we first develop several local and global explanation methods, including a gradient-based method for input-output attrib… ▽ More Large Language Models (LLMs) have achieved remarkable success, where instruction tuning is the critical step in aligning LLMs with user intentions. In this work, we investigate how the instruction tuning adjusts pre-trained models with a focus on intrinsic changes. Specifically, we first develop several local and global explanation methods, including a gradient-based method for input-output attribution, and techniques for interpreting patterns and concepts in self-attention and feed-forward layers. The impact of instruction tuning is then studied by comparing the explanations derived from the pre-trained and instruction-tuned models. This approach provides an internal perspective of the model shifts on a human-comprehensible level. Our findings reveal three significant impacts of instruction tuning: 1) It empowers LLMs to recognize the instruction parts of user prompts, and promotes the response generation constantly conditioned on the instructions. 2) It encourages the self-attention heads to capture more word-word relationships about instruction verbs. 3) It encourages the feed-forward networks to rotate their pre-trained knowledge toward user-oriented tasks. These insights contribute to a more comprehensive understanding of instruction tuning and lay the groundwork for future work that aims at explaining and optimizing LLMs for various applications. Our code and data are publicly available at https://github.com/JacksonWuxs/Interpret_Instruction_Tuning_LLMs. △ Less

Submitted 4 April, 2024; v1 submitted 30 September, 2023; originally announced October 2023.

Comments: Accepted by NAACL 2024

arXiv:2310.00322 [pdf, other]

Red Teaming Game: A Game-Theoretic Framework for Red Teaming Language Models

Authors: Chengdong Ma, Ziran Yang, Minquan Gao, Hai Ci, Jun Gao, Xuehai Pan, Yaodong Yang

Abstract: Deployable Large Language Models (LLMs) must conform to the criterion of helpfulness and harmlessness, thereby achieving consistency between LLMs outputs and human values. Red-teaming techniques constitute a critical way towards this criterion. Existing work rely solely on manual red team designs and heuristic adversarial prompts for vulnerability detection and optimization. These approaches lack… ▽ More Deployable Large Language Models (LLMs) must conform to the criterion of helpfulness and harmlessness, thereby achieving consistency between LLMs outputs and human values. Red-teaming techniques constitute a critical way towards this criterion. Existing work rely solely on manual red team designs and heuristic adversarial prompts for vulnerability detection and optimization. These approaches lack rigorous mathematical formulation, thus limiting the exploration of diverse attack strategy within quantifiable measure and optimization of LLMs under convergence guarantees. In this paper, we present Red-teaming Game (RTG), a general game-theoretic framework without manual annotation. RTG is designed for analyzing the multi-turn attack and defense interactions between Red-team language Models (RLMs) and Blue-team Language Model (BLM). Within the RTG, we propose Gamified Red-teaming Solver (GRTS) with diversity measure of the semantic space. GRTS is an automated red teaming technique to solve RTG towards Nash equilibrium through meta-game analysis, which corresponds to the theoretically guaranteed optimization direction of both RLMs and BLM. Empirical results in multi-turn attacks with RLMs show that GRTS autonomously discovered diverse attack strategies and effectively improved security of LLMs, outperforming existing heuristic red-team designs. Overall, RTG has established a foundational framework for red teaming tasks and constructed a new scalable oversight technique for alignment. △ Less

Submitted 6 April, 2024; v1 submitted 30 September, 2023; originally announced October 2023.

arXiv:2309.16639 [pdf, other]

doi 10.1145/3613904.3642790

MindShift: Leveraging Large Language Models for Mental-States-Based Problematic Smartphone Use Intervention

Authors: Ruolan Wu, Chun Yu, Xiaole Pan, Yujia Liu, Ningning Zhang, Yue Fu, Yuhan Wang, Zhi Zheng, Li Chen, Qiaolei Jiang, Xuhai Xu, Yuanchun Shi

Abstract: Problematic smartphone use negatively affects physical and mental health. Despite the wide range of prior research, existing persuasive techniques are not flexible enough to provide dynamic persuasion content based on users' physical contexts and mental states. We first conducted a Wizard-of-Oz study (N=12) and an interview study (N=10) to summarize the mental states behind problematic smartphone… ▽ More Problematic smartphone use negatively affects physical and mental health. Despite the wide range of prior research, existing persuasive techniques are not flexible enough to provide dynamic persuasion content based on users' physical contexts and mental states. We first conducted a Wizard-of-Oz study (N=12) and an interview study (N=10) to summarize the mental states behind problematic smartphone use: boredom, stress, and inertia. This informs our design of four persuasion strategies: understanding, comforting, evoking, and scaffolding habits. We leveraged large language models (LLMs) to enable the automatic and dynamic generation of effective persuasion content. We developed MindShift, a novel LLM-powered problematic smartphone use intervention technique. MindShift takes users' in-the-moment app usage behaviors, physical contexts, mental states, goals \& habits as input, and generates personalized and dynamic persuasive content with appropriate persuasion strategies. We conducted a 5-week field experiment (N=25) to compare MindShift with its simplified version (remove mental states) and baseline techniques (fixed reminder). The results show that MindShift improves intervention acceptance rates by 4.7-22.5% and reduces smartphone usage duration by 7.4-9.8%. Moreover, users have a significant drop in smartphone addiction scale scores and a rise in self-efficacy scale scores. Our study sheds light on the potential of leveraging LLMs for context-aware persuasion in other behavior change domains. △ Less

Submitted 27 February, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

Comments: Published at ACM CHI'24

MSC Class: 68U35 ACM Class: H.5.2; I.2.7

arXiv:2309.14689 [pdf, ps, other]

Updated measurements of the M1 transition $ψ(3686) \to γη_{c}(2S)$ with $η_{c}(2S) \to K \bar{K} π$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (609 additional authors not shown)

Abstract: Based on a data sample of $(27.08 \pm 0.14 ) \times 10^8~ψ(3686)$ events collected with the BESIII detector at the BEPCII collider, the M1 transition $ψ(3686) \to γη_{c}(2S)$ with $η_{c}(2S) \to K\bar{K}π$ is studied, where $K\bar{K}π$ is $K^{+} K^{-} π^{0}$ or $K_{S}^{0}K^{\pm}π^{\mp}$. The mass and width of the $η_{c}(2S)$ are measured to be $(3637.8 \pm 0.8 (\rm {stat}) \pm 0.2 (\rm {syst}))$ M… ▽ More Based on a data sample of $(27.08 \pm 0.14 ) \times 10^8~ψ(3686)$ events collected with the BESIII detector at the BEPCII collider, the M1 transition $ψ(3686) \to γη_{c}(2S)$ with $η_{c}(2S) \to K\bar{K}π$ is studied, where $K\bar{K}π$ is $K^{+} K^{-} π^{0}$ or $K_{S}^{0}K^{\pm}π^{\mp}$. The mass and width of the $η_{c}(2S)$ are measured to be $(3637.8 \pm 0.8 (\rm {stat}) \pm 0.2 (\rm {syst}))$ MeV/$c^{2}$ and $(10.5 \pm 1.7 (\rm {stat}) \pm 3.5 (\rm {syst}))$ MeV, respectively. The product branching fraction $\mathcal{B}\left(ψ(3686) \rightarrow γη_{c}(2 S)\right) \times \mathcal{B}(η_{c}(2 S) \rightarrow K \bar{K} π)$ is determined to be $(0.97 \pm 0.06 (\rm {stat}) \pm 0.09 (\rm {syst})) \times 10^{-5}$. Using $\mathcal{BR}(η_{c}(2S)\to K\bar{K}π)=(1.86^{+0.68}_{-0.49})\%$, we obtain the branching fraction of the radiative transition to be $\mathcal{BR}(ψ(3686) \to γη_{c}(2S)) = (5.2 \pm 0.3 (\rm {stat}) \pm 0.5 (\rm {syst}) ^{+1.9}_{-1.4} (extr)) \times 10^{-4}$, where the third uncertainty is due to the quoted $\mathcal{BR}(η_{c}(2S) \to K\bar{K}π)$. △ Less

Submitted 26 September, 2023; originally announced September 2023.

arXiv:2309.14667 [pdf, ps, other]

Investigation of the $ΔI = 1/2$ rule and test of CP violation through the measurement of decay asymmetry parameters in $Ξ^-$ decays

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann , et al. (604 additional authors not shown)

Abstract: Using $(10087\pm44)\times 10^{6}$ $J/ψ$ events collected with the BESIII detector, numerous $Ξ^-$ and $Λ$ decay asymmetry parameters are simultaneously determined from the process $J/ψ\to Ξ^- \barΞ^+ \to Λ(pπ^-) π^- \barΛ(\bar{n} π^0) π^+$ and its charge-conjugate channel. The precisions of $α_0$ for $Λ\to nπ^0$ and $\barα_0$ for $\barΛ \to \bar{n}π^0$ compared to world averages are improved by fa… ▽ More Using $(10087\pm44)\times 10^{6}$ $J/ψ$ events collected with the BESIII detector, numerous $Ξ^-$ and $Λ$ decay asymmetry parameters are simultaneously determined from the process $J/ψ\to Ξ^- \barΞ^+ \to Λ(pπ^-) π^- \barΛ(\bar{n} π^0) π^+$ and its charge-conjugate channel. The precisions of $α_0$ for $Λ\to nπ^0$ and $\barα_0$ for $\barΛ \to \bar{n}π^0$ compared to world averages are improved by factors of 4 and 1.7, respectively. The ratio of decay asymmetry parameters of $Λ\to nπ^0$ to that of $Λ\to pπ^-$, $\langle α_0 \rangle/ \langle α_{Λ-} \rangle $, is determined to be $ 0.873 \pm 0.012^{+0.011}_{-0.010}$, where the first and the second uncertainties are statistical and systematic, respectively. The ratio is smaller than unity more than $5σ$, which signifies the existence of the $ΔI = 3/2$ transition in $Λ$ for the first time. Beside, we test for CP violation in $Ξ^- \to Λπ^-$ and in $Λ\to n π^{0}$ with the best precision to date. △ Less

Submitted 8 January, 2024; v1 submitted 26 September, 2023; originally announced September 2023.

Comments: 8 pages, 2 figures, 1 table

arXiv:2309.13883 [pdf, other]

doi 10.1007/JHEP01(2024)180

Measurement of the $e^{+}e^{-} \to K_{S}^{0} K_{L}^{0} π^{0}$ cross sections from $\sqrt{s}=$ 2.000 to 3.080 GeV

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann , et al. (604 additional authors not shown)

Abstract: Based on $e^{+}e^{-}$ collision data collected at center-of-mass energies from 2.000 to 3.080 GeV by the BESIII detector at the BEPCII collider, a partial wave analysis is performed for the process $e^{+}e^{-}\to K_{S}^{0} K_{L}^{0} π^{0}$. The results allow the Born cross sections of the process $e^{+}e^{-}\to K_{S}^{0} K_{L}^{0} π^{0}$, as well as its subprocesses… ▽ More Based on $e^{+}e^{-}$ collision data collected at center-of-mass energies from 2.000 to 3.080 GeV by the BESIII detector at the BEPCII collider, a partial wave analysis is performed for the process $e^{+}e^{-}\to K_{S}^{0} K_{L}^{0} π^{0}$. The results allow the Born cross sections of the process $e^{+}e^{-}\to K_{S}^{0} K_{L}^{0} π^{0}$, as well as its subprocesses $e^{+}e^{-}\to K^{*}(892)^{0}\bar{K}^{0}$ and $K^{*}_{2}(1430)^{0}\bar{K}^{0}$ to be measured. The Born cross sections for $e^{+}e^{-}\to K_{S}^{0}K_{L}^{0}π^{0}$ are consistent with previous measurements by BaBar, but with substantially improved precision. The Born cross section lineshape of the process $e^{+}e^{-}\to K^{*}(892)^{0}\bar{K}^{0}$ is consistent with a vector meson state around 2.2 GeV with a significance of 3.2$σ$. A Breit-Wigner fit determines its mass as $M_Y=(2164.7\pm9.1\pm3.1)~{\rm{MeV}}/c^{2}$ and its width as $Γ_{Y}=(32.4\pm21.0\pm1.8)~\rm{MeV}$. △ Less

Submitted 26 February, 2024; v1 submitted 25 September, 2023; originally announced September 2023.

Journal ref: JHEP01(2024)180

arXiv:2309.12682 [pdf, other]

The comparison of two Zagreb-Fermat eccentricity indices

Authors: Xiangrui Pan, Cheng Zeng, Longyu Li, Gengji Li

Abstract: In this paper, we focus on comparing the first and second Zagreb-Fermat eccentricity indices of graphs. We show that $$\frac{\sum_{uv\in E\left( G \right)}{\varepsilon_3\left( u \right) \varepsilon_3\left( v \right)}}{m\left( G \right)} \leq \frac{\sum_{u\in V\left( G \right)}{\varepsilon_{3}^{2}\left( u \right)}}{n\left( G \right)} $$ holds for all acyclic and unicyclic graphs. Besides, we verify… ▽ More In this paper, we focus on comparing the first and second Zagreb-Fermat eccentricity indices of graphs. We show that $$\frac{\sum_{uv\in E\left( G \right)}{\varepsilon_3\left( u \right) \varepsilon_3\left( v \right)}}{m\left( G \right)} \leq \frac{\sum_{u\in V\left( G \right)}{\varepsilon_{3}^{2}\left( u \right)}}{n\left( G \right)} $$ holds for all acyclic and unicyclic graphs. Besides, we verify that the inequality may not be applied to graphs with at least two cycles. △ Less

Submitted 12 October, 2023; v1 submitted 22 September, 2023; originally announced September 2023.

arXiv:2309.11747 [pdf, other]

MarkNerf:Watermarking for Neural Radiance Field

Authors: Lifeng Chen, Jia Liu, Yan Ke, Wenquan Sun, Weina Dong, Xiaozhong Pan

Abstract: A watermarking algorithm is proposed in this paper to address the copyright protection issue of implicit 3D models. The algorithm involves embedding watermarks into the images in the training set through an embedding network, and subsequently utilizing the NeRF model for 3D modeling. A copyright verifier is employed to generate a backdoor image by providing a secret perspective as input to the neu… ▽ More A watermarking algorithm is proposed in this paper to address the copyright protection issue of implicit 3D models. The algorithm involves embedding watermarks into the images in the training set through an embedding network, and subsequently utilizing the NeRF model for 3D modeling. A copyright verifier is employed to generate a backdoor image by providing a secret perspective as input to the neural radiation field. Subsequently, a watermark extractor is devised using the hyperparameterization method of the neural network to extract the embedded watermark image from that perspective. In a black box scenario, if there is a suspicion that the 3D model has been used without authorization, the verifier can extract watermarks from a secret perspective to verify network copyright. Experimental results demonstrate that the proposed algorithm effectively safeguards the copyright of 3D models. Furthermore, the extracted watermarks exhibit favorable visual effects and demonstrate robust resistance against various types of noise attacks. △ Less

Submitted 20 September, 2023; originally announced September 2023.

arXiv:2309.11149 [pdf, other]

Electrostatic environment and Majorana bound states in full-shell topological insulator nanowires

Authors: Li Chen, Xiao-Hong Pan, Zhan Cao, Dong E. Liu, Xin Liu

Abstract: The combination of a superconductor (SC) and a topological insulator (TI) nanowire was proposed as a potential candidate for realizing Majorana zero modes (MZMs). In this study, we adopt the Schrödinger-Poisson formalism to incorporate the electrostatic environment inside the nanowire and systematically explore its topological properties. Our calculations reveal that the proximity to the SC induce… ▽ More The combination of a superconductor (SC) and a topological insulator (TI) nanowire was proposed as a potential candidate for realizing Majorana zero modes (MZMs). In this study, we adopt the Schrödinger-Poisson formalism to incorporate the electrostatic environment inside the nanowire and systematically explore its topological properties. Our calculations reveal that the proximity to the SC induces a band bending effect, leading to a non-uniform potential across the TI nanowire. As a consequence, there is an upward shift of the Fermi level within the conduction band. This gives rise to the coexistence of surface and bulk states, localized in an accumulation layer adjacent to the TI-SC interface. When magnetic flux is applied, these occupied states have different flux-penetration areas, suppressing the superconducting gap. However, this impact can be mitigated by increasing the radius of the nanowire. Finally, We demonstrate that MZMs can be achieved across a wide range of parameters centered around one applied flux quantum, $φ_0 = h/2e$. Within this regime, MZMs can be realized even in the presence of conduction bands, which are not affected by the band bending effect. These findings provide valuable insights into the practical realization of MZMs in TI nanowire-based devices, especially in the presence of a complicated electrostatic environment. △ Less

Submitted 22 September, 2023; v1 submitted 20 September, 2023; originally announced September 2023.

arXiv:2309.10503 [pdf, other]

Steganography for Neural Radiance Fields by Backdooring

Authors: Weina Dong, Jia Liu, Yan Ke, Lifeng Chen, Wenquan Sun, Xiaozhong Pan

Abstract: The utilization of implicit representation for visual data (such as images, videos, and 3D models) has recently gained significant attention in computer vision research. In this letter, we propose a novel model steganography scheme with implicit neural representation. The message sender leverages Neural Radiance Fields (NeRF) and its viewpoint synthesis capabilities by introducing a viewpoint as a… ▽ More The utilization of implicit representation for visual data (such as images, videos, and 3D models) has recently gained significant attention in computer vision research. In this letter, we propose a novel model steganography scheme with implicit neural representation. The message sender leverages Neural Radiance Fields (NeRF) and its viewpoint synthesis capabilities by introducing a viewpoint as a key. The NeRF model generates a secret viewpoint image, which serves as a backdoor. Subsequently, we train a message extractor using overfitting to establish a one-to-one map** between the secret message and the secret viewpoint image. The sender delivers the trained NeRF model and the message extractor to the receiver over the open channel, and the receiver utilizes the key shared by both parties to obtain the rendered image in the secret view from the NeRF model, and then obtains the secret message through the message extractor. The inherent complexity of the viewpoint information prevents attackers from stealing the secret message accurately. Experimental results demonstrate that the message extractor trained in this letter achieves high-capacity steganography with fast performance, achieving a 100\% accuracy in message extraction. Furthermore, the extensive viewpoint key space of NeRF ensures the security of the steganography scheme. △ Less

Submitted 19 September, 2023; originally announced September 2023.

Comments: 6 pages, 7 figures

arXiv:2309.10438 [pdf, other]

AutoDiffusion: Training-Free Optimization of Time Steps and Architectures for Automated Diffusion Model Acceleration

Authors: Lijiang Li, Huixia Li, Xiawu Zheng, Jie Wu, Xuefeng Xiao, Rui Wang, Min Zheng, Xin Pan, Fei Chao, Rongrong Ji

Abstract: Diffusion models are emerging expressive generative models, in which a large number of time steps (inference steps) are required for a single image generation. To accelerate such tedious process, reducing steps uniformly is considered as an undisputed principle of diffusion models. We consider that such a uniform assumption is not the optimal solution in practice; i.e., we can find different optim… ▽ More Diffusion models are emerging expressive generative models, in which a large number of time steps (inference steps) are required for a single image generation. To accelerate such tedious process, reducing steps uniformly is considered as an undisputed principle of diffusion models. We consider that such a uniform assumption is not the optimal solution in practice; i.e., we can find different optimal time steps for different models. Therefore, we propose to search the optimal time steps sequence and compressed model architecture in a unified framework to achieve effective image generation for diffusion models without any further training. Specifically, we first design a unified search space that consists of all possible time steps and various architectures. Then, a two stage evolutionary algorithm is introduced to find the optimal solution in the designed search space. To further accelerate the search process, we employ FID score between generated and real samples to estimate the performance of the sampled examples. As a result, the proposed method is (i).training-free, obtaining the optimal time steps and model architecture without any training process; (ii). orthogonal to most advanced diffusion samplers and can be integrated to gain better sample quality. (iii). generalized, where the searched time steps and architectures can be directly applied on different diffusion models with the same guidance scale. Experimental results show that our method achieves excellent performance by using only a few time steps, e.g. 17.86 FID score on ImageNet 64 $\times$ 64 with only four steps, compared to 138.66 with DDIM. The code is available at https://github.com/lilijiangg/AutoDiffusion. △ Less

Submitted 23 September, 2023; v1 submitted 19 September, 2023; originally announced September 2023.

arXiv:2309.10305 [pdf, other]

Baichuan 2: Open Large-scale Language Models

Authors: Aiyuan Yang, Bin Xiao, Bingning Wang, Borong Zhang, Ce Bian, Chao Yin, Chenxu Lv, Da Pan, Dian Wang, Dong Yan, Fan Yang, Fei Deng, Feng Wang, Feng Liu, Guangwei Ai, Guosheng Dong, Haizhou Zhao, Hang Xu, Haoze Sun, Hongda Zhang, Hui Liu, Jiaming Ji, Jian Xie, JunTao Dai, Kun Fang , et al. (30 additional authors not shown)

Abstract: Large language models (LLMs) have demonstrated remarkable performance on a variety of natural language tasks based on just a few examples of natural language instructions, reducing the need for extensive feature engineering. However, most powerful LLMs are closed-source or limited in their capability for languages other than English. In this technical report, we present Baichuan 2, a series of lar… ▽ More Large language models (LLMs) have demonstrated remarkable performance on a variety of natural language tasks based on just a few examples of natural language instructions, reducing the need for extensive feature engineering. However, most powerful LLMs are closed-source or limited in their capability for languages other than English. In this technical report, we present Baichuan 2, a series of large-scale multilingual language models containing 7 billion and 13 billion parameters, trained from scratch, on 2.6 trillion tokens. Baichuan 2 matches or outperforms other open-source models of similar size on public benchmarks like MMLU, CMMLU, GSM8K, and HumanEval. Furthermore, Baichuan 2 excels in vertical domains such as medicine and law. We will release all pre-training model checkpoints to benefit the research community in better understanding the training dynamics of Baichuan 2. △ Less

Submitted 20 September, 2023; v1 submitted 19 September, 2023; originally announced September 2023.

Comments: Baichuan 2 technical report. Github: https://github.com/baichuan-inc/Baichuan2

arXiv:2309.09310 [pdf, other]

UGC: Unified GAN Compression for Efficient Image-to-Image Translation

Authors: Yuxi Ren, Jie Wu, Peng Zhang, Manlin Zhang, Xuefeng Xiao, Qian He, Rui Wang, Min Zheng, Xin Pan

Abstract: Recent years have witnessed the prevailing progress of Generative Adversarial Networks (GANs) in image-to-image translation. However, the success of these GAN models hinges on ponderous computational costs and labor-expensive training data. Current efficient GAN learning techniques often fall into two orthogonal aspects: i) model slimming via reduced calculation costs; ii)data/label-efficient lear… ▽ More Recent years have witnessed the prevailing progress of Generative Adversarial Networks (GANs) in image-to-image translation. However, the success of these GAN models hinges on ponderous computational costs and labor-expensive training data. Current efficient GAN learning techniques often fall into two orthogonal aspects: i) model slimming via reduced calculation costs; ii)data/label-efficient learning with fewer training data/labels. To combine the best of both worlds, we propose a new learning paradigm, Unified GAN Compression (UGC), with a unified optimization objective to seamlessly prompt the synergy of model-efficient and label-efficient learning. UGC sets up semi-supervised-driven network architecture search and adaptive online semi-supervised distillation stages sequentially, which formulates a heterogeneous mutual learning scheme to obtain an architecture-flexible, label-efficient, and performance-excellent model. △ Less

Submitted 17 September, 2023; originally announced September 2023.

arXiv:2309.08744 [pdf, other]

Personalized Food Image Classification: Benchmark Datasets and New Baseline

Authors: Xinyue Pan, Jiangpeng He, Fengqing Zhu

Abstract: Food image classification is a fundamental step of image-based dietary assessment, enabling automated nutrient analysis from food images. Many current methods employ deep neural networks to train on generic food image datasets that do not reflect the dynamism of real-life food consumption patterns, in which food images appear sequentially over time, reflecting the progression of what an individual… ▽ More Food image classification is a fundamental step of image-based dietary assessment, enabling automated nutrient analysis from food images. Many current methods employ deep neural networks to train on generic food image datasets that do not reflect the dynamism of real-life food consumption patterns, in which food images appear sequentially over time, reflecting the progression of what an individual consumes. Personalized food classification aims to address this problem by training a deep neural network using food images that reflect the consumption pattern of each individual. However, this problem is under-explored and there is a lack of benchmark datasets with individualized food consumption patterns due to the difficulty in data collection. In this work, we first introduce two benchmark personalized datasets including the Food101-Personal, which is created based on surveys of daily dietary patterns from participants in the real world, and the VFNPersonal, which is developed based on a dietary study. In addition, we propose a new framework for personalized food image classification by leveraging self-supervised learning and temporal image feature information. Our method is evaluated on both benchmark datasets and shows improved performance compared to existing works. The dataset has been made available at: https://skynet.ecn.purdue.edu/~pan161/dataset_personal.html △ Less

Submitted 15 September, 2023; originally announced September 2023.

Comments: Accepted by IEEE Asilomar conference (2023)

arXiv:2309.08172 [pdf, other]

LASER: LLM Agent with State-Space Exploration for Web Navigation

Authors: Kaixin Ma, Hongming Zhang, Hongwei Wang, Xiaoman Pan, Wenhao Yu, Dong Yu

Abstract: Large language models (LLMs) have been successfully adapted for interactive decision-making tasks like web navigation. While achieving decent performance, previous methods implicitly assume a forward-only execution mode for the model, where they only provide oracle trajectories as in-context examples to guide the model on how to reason in the environment. Consequently, the model could not handle m… ▽ More Large language models (LLMs) have been successfully adapted for interactive decision-making tasks like web navigation. While achieving decent performance, previous methods implicitly assume a forward-only execution mode for the model, where they only provide oracle trajectories as in-context examples to guide the model on how to reason in the environment. Consequently, the model could not handle more challenging scenarios not covered in the in-context examples, e.g., mistakes, leading to sub-optimal performance. To address this issue, we propose to model the interactive task as state space exploration, where the LLM agent transitions among a pre-defined set of states by performing actions to complete the task. This formulation enables flexible backtracking, allowing the model to recover from errors easily. We evaluate our proposed LLM Agent with State-Space ExploRation (LASER) on both the WebShop task and amazon.com. Experimental results show that LASER significantly outperforms previous methods and closes the gap with human performance on the web navigation task. △ Less

Submitted 21 February, 2024; v1 submitted 15 September, 2023; originally announced September 2023.

Comments: 4 pages, 2 figures

arXiv:2309.06368 [pdf, ps, other]

Measurements of the absolute branching fractions of $Ω^-$ decays and test of the $ΔI = 1/2$ rule

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann , et al. (599 additional authors not shown)

Abstract: Based on a data set of $(27.12\pm0.10)\times 10^8$ $ψ(3686)$ events collected at the BESIII experiment, the absolute branching fractions of the three dominant $Ω^-$ decays are measured to be $\mathcal{B}_{Ω^- \to Ξ^0 π^-} = (25.03\pm0.44\pm0.53)\%$, $\mathcal{B}_{Ω^- \to Ξ^- π^0} = (8.43\pm0.52\pm0.28)\%$, and $\mathcal{B}_{Ω^- \to ΛK^-} = (66.3\pm0.8\pm2.0)\%$, where the first and second uncertai… ▽ More Based on a data set of $(27.12\pm0.10)\times 10^8$ $ψ(3686)$ events collected at the BESIII experiment, the absolute branching fractions of the three dominant $Ω^-$ decays are measured to be $\mathcal{B}_{Ω^- \to Ξ^0 π^-} = (25.03\pm0.44\pm0.53)\%$, $\mathcal{B}_{Ω^- \to Ξ^- π^0} = (8.43\pm0.52\pm0.28)\%$, and $\mathcal{B}_{Ω^- \to ΛK^-} = (66.3\pm0.8\pm2.0)\%$, where the first and second uncertainties are statistical and systematic, respectively. The ratio between $\mathcal{B}_{Ω^- \to Ξ^0 π^-}$ and $\mathcal{B}_{Ω^- \to Ξ^- π^0}$ is determined to be $2.97\pm0.19\pm0.11$, which is in good agreement with the PDG value of $2.74\pm0.15$, but greater by more than four standard deviations than the theoretical prediction of 2 obtained from the $ΔI = 1/2$ rule. △ Less

Submitted 12 September, 2023; originally announced September 2023.

arXiv:2309.05760 [pdf, ps, other]

doi 10.1103/PhysRevLett.132.131903

Observation of $D^{+}\to K_{S}^{0}a_{0}(980)^{+}$ in the amplitude analysis of $D^{+} \to K_{S}^{0}π^+η$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (604 additional authors not shown)

Abstract: We perform for the first time an amplitude analysis of the decay $D^{+}\to K_{S}^{0}π^+η$ and report the observation of the decay $D^{+}\to K_{S}^{0}a_{0}(980)^{+}$ using 2.93 fb$^{-1}$ of $e^+e^-$ collision data taken at a center-of-mass energy of 3.773 GeV with the BESIII detector. As the only W-annihilation free decay among $D$ to $a_{0}(980)$-pseudoscalar, $D^{+}\to K_{S}^{0}a_{0}(980)^{+}$ is… ▽ More We perform for the first time an amplitude analysis of the decay $D^{+}\to K_{S}^{0}π^+η$ and report the observation of the decay $D^{+}\to K_{S}^{0}a_{0}(980)^{+}$ using 2.93 fb$^{-1}$ of $e^+e^-$ collision data taken at a center-of-mass energy of 3.773 GeV with the BESIII detector. As the only W-annihilation free decay among $D$ to $a_{0}(980)$-pseudoscalar, $D^{+}\to K_{S}^{0}a_{0}(980)^{+}$ is the ideal decay to extract the contributions of the external and internal $W$-emission amplitudes involving $a_{0}(980)$ and study the final-state interactions. The absolute branching fraction of $D^{+}\to K_{S}^{0}π^+η$ is measured to be $(1.27\pm0.04_{\rm stat.}\pm0.03_{\rm syst.})\%$. The product branching fractions of $D^{+}\to K_{S}^{0}a_{0}(980)^{+}$ with $a_{0}(980)^{+}\to π^+η$ and $D^{+}\to π^+ K_0^*(1430)^0$ with $K_0^*(1430)^0\to K_{S}^{0}η$ are measured to be $(1.33\pm0.05_{\rm stat.}\pm0.04_{\rm syst.})\%$ and $(0.14\pm0.03_{\rm stat.}\pm0.01_{\rm syst.})\%$, respectively. △ Less

Submitted 29 March, 2024; v1 submitted 11 September, 2023; originally announced September 2023.

Journal ref: Phys. Rev. Lett. 132, 131903 (2024)

arXiv:2309.05484 [pdf, other]

doi 10.1103/PhysRevD.109.L071103

Observation of the Singly Cabibbo-Suppressed Decay $Λ_{c}^{+}\to Σ^{-}K^{+}π^{+}$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (605 additional authors not shown)

Abstract: The singly Cabibbo-suppressed decay $Λ_{c}^{+}\to Σ^{-}K^{+}π^{+}$ is observed for the first time with a statistical significance of $6.4σ$ by using 4.5 fb$^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies between 4.600 and 4.699 GeV with the BESIII detector at BEPCII. The absolute branching fraction of $Λ_{c}^{+}\to Σ^{-}K^{+}π^{+}$ is measured to be… ▽ More The singly Cabibbo-suppressed decay $Λ_{c}^{+}\to Σ^{-}K^{+}π^{+}$ is observed for the first time with a statistical significance of $6.4σ$ by using 4.5 fb$^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies between 4.600 and 4.699 GeV with the BESIII detector at BEPCII. The absolute branching fraction of $Λ_{c}^{+}\to Σ^{-}K^{+}π^{+}$ is measured to be $(3.8\pm1.3_{\rm stat}\pm0.2_{\rm syst})\times 10^{-4}$ in a model-independent approach. This is the first observation of a Cabibbo-suppressed $Λ_{c}^{+}$ decay involving $Σ^-$ in the final state. The ratio of branching fractions between $Λ_{c}^{+}\to Σ^{-}K^{+}π^{+}$ and the Cabibbo-favored decay $Λ_{c}^{+}\to Σ^- π^+π^+$ is calculated to be $(0.4 \pm 0.1)s_{c}^{2}$, where $s_{c} \equiv \sinθ_c = 0.2248$ with $θ_c$ the Cabibbo mixing angle. This ratio significantly deviates from $1.0s_{c}^{2}$ and provides important information for the understanding of nonfactorization contributions in $Λ_{c}^{+}$ decays. △ Less

Submitted 8 May, 2024; v1 submitted 11 September, 2023; originally announced September 2023.

Comments: 8 pages, 2 figures

Journal ref: Phys. Rev. D 109, L071103 (2024)

arXiv:2309.04977 [pdf, other]

doi 10.1109/IJCNN54540.2023.10191577

RGAT: A Deeper Look into Syntactic Dependency Information for Coreference Resolution

Authors: Yuan Meng, Xuhao Pan, Jun Chang, Yue Wang

Abstract: Although syntactic information is beneficial for many NLP tasks, combining it with contextual information between words to solve the coreference resolution problem needs to be further explored. In this paper, we propose an end-to-end parser that combines pre-trained BERT with a Syntactic Relation Graph Attention Network (RGAT) to take a deeper look into the role of syntactic dependency information… ▽ More Although syntactic information is beneficial for many NLP tasks, combining it with contextual information between words to solve the coreference resolution problem needs to be further explored. In this paper, we propose an end-to-end parser that combines pre-trained BERT with a Syntactic Relation Graph Attention Network (RGAT) to take a deeper look into the role of syntactic dependency information for the coreference resolution task. In particular, the RGAT model is first proposed, then used to understand the syntactic dependency graph and learn better task-specific syntactic embeddings. An integrated architecture incorporating BERT embeddings and syntactic embeddings is constructed to generate blending representations for the downstream task. Our experiments on a public Gendered Ambiguous Pronouns (GAP) dataset show that with the supervision learning of the syntactic dependency graph and without fine-tuning the entire BERT, we increased the F1-score of the previous best model (RGCN-with-BERT) from 80.3% to 82.5%, compared to the F1-score by single BERT embeddings from 78.5% to 82.5%. Experimental results on another public dataset - OntoNotes 5.0 demonstrate that the performance of the model is also improved by incorporating syntactic dependency information learned from RGAT. △ Less

Submitted 10 September, 2023; originally announced September 2023.

Comments: 8 pages, 5 figures

MSC Class: 14J60 (Primary) 14F05; 14J26 (Secondary)

Journal ref: 2023 International Joint Conference on Neural Networks (IJCNN)

arXiv:2309.04559 [pdf]

Spin injection across a III-V/chiral perovskite interface enabling spin accumulation at room temperature

Authors: Matthew P. Hautzinger, Xin Pan, Steven C. Hayden, Jiselle Y. Ye, Qi Jiang, Mickey J. Wilson, Yifan Dong, Emily K. Raulerson, Ian A. Leahy, Chun-Sheng Jiang, Joseph M. Luther, Yuan Lu, Katherine Jungjohann, Z. Valy Vardeny, Joseph J. Berry, Kirstin Alberi, Matthew C. Beard

Abstract: Spin accumulation in semiconductor structures at room temperature and without magnetic fields is key to enable a broader range of opto-electronic functionality. Current efforts are limited due to inherent inefficiencies associated with spin injection into semiconductor structures. Here, we demonstrate spin injection across chiral halide perovskite/III-V interfaces achieving spin accumulation in a… ▽ More Spin accumulation in semiconductor structures at room temperature and without magnetic fields is key to enable a broader range of opto-electronic functionality. Current efforts are limited due to inherent inefficiencies associated with spin injection into semiconductor structures. Here, we demonstrate spin injection across chiral halide perovskite/III-V interfaces achieving spin accumulation in a standard semiconductor III-V (AlxGa1-x)0.5In0.5P multiple quantum well (MQW) light emitting diode (LED). The spin accumulation in the MQW is detected via emission of circularly polarized light with a degree of polarization of up to ~15%. The chiral perovskite/III-V interface was characterized with X-ray photoemission spectroscopy (XPS), cross sectional scanning Kelvin probe force microscopy, and cross section transmission electron microscopy (TEM) imaging, showing a clean semiconductor/semiconductor interface where the fermi-level can equilibrate. These findings demonstrate chiral perovskite semiconductors can transform well-developed semiconductor platforms to ones that can also control spin. △ Less

Submitted 14 November, 2023; v1 submitted 8 September, 2023; originally announced September 2023.

Showing 151–200 of 1,062 results for author: Pan, X