-
TEQ: Trainable Equivalent Transformation for Quantization of LLMs
Authors:
Wenhua Cheng,
Yiyang Cai,
Kaokao Lv,
Haihao Shen
Abstract:
As large language models (LLMs) become more prevalent, there is a growing need for new and improved quantization methods that can meet the computationalast layer demands of these modern architectures while maintaining the accuracy. In this paper, we present TEQ, a trainable equivalent transformation that preserves the FP32 precision of the model output while taking advantage of low-precision quant…
▽ More
As large language models (LLMs) become more prevalent, there is a growing need for new and improved quantization methods that can meet the computationalast layer demands of these modern architectures while maintaining the accuracy. In this paper, we present TEQ, a trainable equivalent transformation that preserves the FP32 precision of the model output while taking advantage of low-precision quantization, especially 3 and 4 bits weight-only quantization. The training process is lightweight, requiring only 1K steps and fewer than 0.1 percent of the original model's trainable parameters. Furthermore, the transformation does not add any computational overhead during inference. Our results are on-par with the state-of-the-art (SOTA) methods on typical LLMs. Our approach can be combined with other methods to achieve even better performance. The code is available at https://github.com/intel/neural-compressor.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
RegaVAE: A Retrieval-Augmented Gaussian Mixture Variational Auto-Encoder for Language Modeling
Authors:
**gcheng Deng,
Liang Pang,
Huawei Shen,
Xueqi Cheng
Abstract:
Retrieval-augmented language models show promise in addressing issues like outdated information and hallucinations in language models (LMs). However, current research faces two main problems: 1) determining what information to retrieve, and 2) effectively combining retrieved information during generation. We argue that valuable retrieved information should not only be related to the current source…
▽ More
Retrieval-augmented language models show promise in addressing issues like outdated information and hallucinations in language models (LMs). However, current research faces two main problems: 1) determining what information to retrieve, and 2) effectively combining retrieved information during generation. We argue that valuable retrieved information should not only be related to the current source text but also consider the future target text, given the nature of LMs that model future tokens. Moreover, we propose that aggregation using latent variables derived from a compact latent space is more efficient than utilizing explicit raw text, which is limited by context length and susceptible to noise. Therefore, we introduce RegaVAE, a retrieval-augmented language model built upon the variational auto-encoder (VAE). It encodes the text corpus into a latent space, capturing current and future information from both source and target text. Additionally, we leverage the VAE to initialize the latent space and adopt the probabilistic form of the retrieval generation paradigm by expanding the Gaussian prior distribution into a Gaussian mixture distribution. Theoretical analysis provides an optimizable upper bound for RegaVAE. Experimental results on various datasets demonstrate significant improvements in text generation quality and hallucination removal.
△ Less
Submitted 23 October, 2023; v1 submitted 16 October, 2023;
originally announced October 2023.
-
Measurement of the cross sections for $e^+e^-\toηπ^+π^-$ at center-of-mass energies between 2.00 and 3.08 GeV
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (605 additional authors not shown)
Abstract:
Using data samples collected at center-of-mass energies between 2.000 and 3.080 GeV with the BESIII detector operating at the BEPCII collider, a partial-wave analysis is performed on the process $e^+e^-\toηπ^+π^-$. In addition to the dominant $e^+e^-\toρη$ component, the $e^+e^-\to a_2(1320)π$ process is also sizeable, contributing up to 24% of the total reaction. The measured cross sections of th…
▽ More
Using data samples collected at center-of-mass energies between 2.000 and 3.080 GeV with the BESIII detector operating at the BEPCII collider, a partial-wave analysis is performed on the process $e^+e^-\toηπ^+π^-$. In addition to the dominant $e^+e^-\toρη$ component, the $e^+e^-\to a_2(1320)π$ process is also sizeable, contributing up to 24% of the total reaction. The measured cross sections of the process $e^+e^-\toηπ^+π^-$ are systematically higher than those of BaBar by more than $3σ$ at center-of-mass energies between 2.000 and 2.300 GeV. In the cross section lineshape for $e^+e^-\to a_2(1320)π$, a resonant structure is observed with a significance of $5.5σ$, with $M=(2044\pm31\pm4)$ MeV/$c^2$, $Γ=(163\pm69\pm24)$ MeV and $\mathcal{B_{R}}\cdotΓ_{e^+e^-}^{R}=(34.6\pm17.1\pm6.0)$ eV or $(137.1\pm73.3\pm2.1)$ eV. In the cross section lineshape for $e^+e^-\toρη$, an evidence of a dip structure around 2180 MeV/$c^2$ is observed with statistical significance of $3.0σ$.
△ Less
Submitted 28 November, 2023; v1 submitted 16 October, 2023;
originally announced October 2023.
-
Causality and Independence Enhancement for Biased Node Classification
Authors:
Guoxin Chen,
Yongqing Wang,
Fangda Guo,
Qinglang Guo,
Jiangli Shao,
Huawei Shen,
Xueqi Cheng
Abstract:
Most existing methods that address out-of-distribution (OOD) generalization for node classification on graphs primarily focus on a specific type of data biases, such as label selection bias or structural bias. However, anticipating the type of bias in advance is extremely challenging, and designing models solely for one specific type may not necessarily improve overall generalization performance.…
▽ More
Most existing methods that address out-of-distribution (OOD) generalization for node classification on graphs primarily focus on a specific type of data biases, such as label selection bias or structural bias. However, anticipating the type of bias in advance is extremely challenging, and designing models solely for one specific type may not necessarily improve overall generalization performance. Moreover, limited research has focused on the impact of mixed biases, which are more prevalent and demanding in real-world scenarios. To address these limitations, we propose a novel Causality and Independence Enhancement (CIE) framework, applicable to various graph neural networks (GNNs). Our approach estimates causal and spurious features at the node representation level and mitigates the influence of spurious correlations through the backdoor adjustment. Meanwhile, independence constraint is introduced to improve the discriminability and stability of causal and spurious features in complex biased environments. Essentially, CIE eliminates different types of data biases from a unified perspective, without the need to design separate methods for each bias as before. To evaluate the performance under specific types of data biases, mixed biases, and low-resource scenarios, we conducted comprehensive experiments on five publicly available datasets. Experimental results demonstrate that our approach CIE not only significantly enhances the performance of GNNs but outperforms state-of-the-art debiased node classification methods.
△ Less
Submitted 4 November, 2023; v1 submitted 14 October, 2023;
originally announced October 2023.
-
Differentially Private Non-convex Learning for Multi-layer Neural Networks
Authors:
Hanpu Shen,
Cheng-Long Wang,
Zihang Xiang,
Yiming Ying,
Di Wang
Abstract:
This paper focuses on the problem of Differentially Private Stochastic Optimization for (multi-layer) fully connected neural networks with a single output node. In the first part, we examine cases with no hidden nodes, specifically focusing on Generalized Linear Models (GLMs). We investigate the well-specific model where the random noise possesses a zero mean, and the link function is both bounded…
▽ More
This paper focuses on the problem of Differentially Private Stochastic Optimization for (multi-layer) fully connected neural networks with a single output node. In the first part, we examine cases with no hidden nodes, specifically focusing on Generalized Linear Models (GLMs). We investigate the well-specific model where the random noise possesses a zero mean, and the link function is both bounded and Lipschitz continuous. We propose several algorithms and our analysis demonstrates the feasibility of achieving an excess population risk that remains invariant to the data dimension. We also delve into the scenario involving the ReLU link function, and our findings mirror those of the bounded link function. We conclude this section by contrasting well-specified and misspecified models, using ReLU regression as a representative example.
In the second part of the paper, we extend our ideas to two-layer neural networks with sigmoid or ReLU activation functions in the well-specified model. In the third part, we study the theoretical guarantees of DP-SGD in Abadi et al. (2016) for fully connected multi-layer neural networks. By utilizing recent advances in Neural Tangent Kernel theory, we provide the first excess population risk when both the sample size and the width of the network are sufficiently large. Additionally, we discuss the role of some parameters in DP-SGD regarding their utility, both theoretically and empirically.
△ Less
Submitted 12 October, 2023;
originally announced October 2023.
-
Multi-View Variational Autoencoder for Missing Value Imputation in Untargeted Metabolomics
Authors:
Chen Zhao,
Kuan-Jui Su,
Chong Wu,
Xuewei Cao,
Qiuying Sha,
Wu Li,
Zhe Luo,
Tian Qin,
Chuan Qiu,
Lan Juan Zhao,
Anqi Liu,
Lindong Jiang,
Xiao Zhang,
Hui Shen,
Weihua Zhou,
Hong-Wen Deng
Abstract:
Background: Missing data is a common challenge in mass spectrometry-based metabolomics, which can lead to biased and incomplete analyses. The integration of whole-genome sequencing (WGS) data with metabolomics data has emerged as a promising approach to enhance the accuracy of data imputation in metabolomics studies. Method: In this study, we propose a novel method that leverages the information f…
▽ More
Background: Missing data is a common challenge in mass spectrometry-based metabolomics, which can lead to biased and incomplete analyses. The integration of whole-genome sequencing (WGS) data with metabolomics data has emerged as a promising approach to enhance the accuracy of data imputation in metabolomics studies. Method: In this study, we propose a novel method that leverages the information from WGS data and reference metabolites to impute unknown metabolites. Our approach utilizes a multi-view variational autoencoder to jointly model the burden score, polygenetic risk score (PGS), and linkage disequilibrium (LD) pruned single nucleotide polymorphisms (SNPs) for feature extraction and missing metabolomics data imputation. By learning the latent representations of both omics data, our method can effectively impute missing metabolomics values based on genomic information. Results: We evaluate the performance of our method on empirical metabolomics datasets with missing values and demonstrate its superiority compared to conventional imputation techniques. Using 35 template metabolites derived burden scores, PGS and LD-pruned SNPs, the proposed methods achieved R^2-scores > 0.01 for 71.55% of metabolites. Conclusion: The integration of WGS data in metabolomics imputation not only improves data completeness but also enhances downstream analyses, paving the way for more comprehensive and accurate investigations of metabolic pathways and disease associations. Our findings offer valuable insights into the potential benefits of utilizing WGS data for metabolomics data imputation and underscore the importance of leveraging multi-modal data integration in precision medicine research.
△ Less
Submitted 12 March, 2024; v1 submitted 11 October, 2023;
originally announced October 2023.
-
Deep Learning Predicts Biomarker Status and Discovers Related Histomorphology Characteristics for Low-Grade Glioma
Authors:
Zijie Fang,
Yihan Liu,
Yifeng Wang,
Xiangyang Zhang,
Yang Chen,
Chang**g Cai,
Yiyang Lin,
Ying Han,
Zhi Wang,
Shan Zeng,
Hong Shen,
Jun Tan,
Yongbing Zhang
Abstract:
Biomarker detection is an indispensable part in the diagnosis and treatment of low-grade glioma (LGG). However, current LGG biomarker detection methods rely on expensive and complex molecular genetic testing, for which professionals are required to analyze the results, and intra-rater variability is often reported. To overcome these challenges, we propose an interpretable deep learning pipeline, a…
▽ More
Biomarker detection is an indispensable part in the diagnosis and treatment of low-grade glioma (LGG). However, current LGG biomarker detection methods rely on expensive and complex molecular genetic testing, for which professionals are required to analyze the results, and intra-rater variability is often reported. To overcome these challenges, we propose an interpretable deep learning pipeline, a Multi-Biomarker Histomorphology Discoverer (Multi-Beholder) model based on the multiple instance learning (MIL) framework, to predict the status of five biomarkers in LGG using only hematoxylin and eosin-stained whole slide images and slide-level biomarker status labels. Specifically, by incorporating the one-class classification into the MIL framework, accurate instance pseudo-labeling is realized for instance-level supervision, which greatly complements the slide-level labels and improves the biomarker prediction performance. Multi-Beholder demonstrates superior prediction performance and generalizability for five LGG biomarkers (AUROC=0.6469-0.9735) in two cohorts (n=607) with diverse races and scanning protocols. Moreover, the excellent interpretability of Multi-Beholder allows for discovering the quantitative and qualitative correlations between biomarker status and histomorphology characteristics. Our pipeline not only provides a novel approach for biomarker prediction, enhancing the applicability of molecular treatments for LGG patients but also facilitates the discovery of new mechanisms in molecular functionality and LGG progression.
△ Less
Submitted 11 October, 2023;
originally announced October 2023.
-
Search for $J/ψ$ weak decays containing $D$ meson
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann
, et al. (600 additional authors not shown)
Abstract:
Using a sample of about 10 billion $J/ψ$ events with the BESIII detector, we search for the weak decays of $J/ψ\to \bar{D}^0π^0 + c.c.$, $J/ψ\to \bar{D}^0η+ c.c.$, $J/ψ\to \bar{D}^0ρ^0 + c.c.$, $J/ψ\to D^-π^+ + c.c.$, and $J/ψ\to D^-ρ^+ + c.c.$. Since no significant signal is observed, we set the upper limits of the branching fractions of these decays to be…
▽ More
Using a sample of about 10 billion $J/ψ$ events with the BESIII detector, we search for the weak decays of $J/ψ\to \bar{D}^0π^0 + c.c.$, $J/ψ\to \bar{D}^0η+ c.c.$, $J/ψ\to \bar{D}^0ρ^0 + c.c.$, $J/ψ\to D^-π^+ + c.c.$, and $J/ψ\to D^-ρ^+ + c.c.$. Since no significant signal is observed, we set the upper limits of the branching fractions of these decays to be $\mathcal{B}(J/ψ\to \bar{D}^0π^0 + c.c.) < 4.7 \times 10^{-7}$, $\mathcal{B}(J/ψ\to \bar{D}^0η+ c.c.) < 6.8 \times 10^{-7}$, $\mathcal{B}(J/ψ\to \bar{D}^0ρ^0 + c.c.) < 5.2 \times 10^{-7}$, $\mathcal{B}(J/ψ\to D^-π^+ + c.c.) < 7.0 \times 10^{-8}$, and $\mathcal{B}(J/ψ\to D^-ρ^+ + c.c.) < 6.0 \times 10^{-7}$ at the 90\% confidence level.
△ Less
Submitted 11 October, 2023;
originally announced October 2023.
-
How to Teach Programming in the AI Era? Using LLMs as a Teachable Agent for Debugging
Authors:
Qianou Ma,
Hua Shen,
Kenneth Koedinger,
Tongshuang Wu
Abstract:
Large Language Models (LLMs) now excel at generative skills and can create content at impeccable speeds. However, they are imperfect and still make various mistakes. In a Computer Science education context, as these models are widely recognized as "AI pair programmers," it becomes increasingly important to train students on evaluating and debugging the LLM-generated code. In this work, we introduc…
▽ More
Large Language Models (LLMs) now excel at generative skills and can create content at impeccable speeds. However, they are imperfect and still make various mistakes. In a Computer Science education context, as these models are widely recognized as "AI pair programmers," it becomes increasingly important to train students on evaluating and debugging the LLM-generated code. In this work, we introduce HypoCompass, a novel system to facilitate deliberate practice on debugging, where human novices play the role of Teaching Assistants and help LLM-powered teachable agents debug code. We enable effective task delegation between students and LLMs in this learning-by-teaching environment: students focus on hypothesizing the cause of code errors, while adjacent skills like code completion are offloaded to LLM-agents. Our evaluations demonstrate that HypoCompass generates high-quality training materials (e.g., bugs and fixes), outperforming human counterparts fourfold in efficiency, and significantly improves student performance on debugging by 12% in the pre-to-post test.
△ Less
Submitted 13 June, 2024; v1 submitted 8 October, 2023;
originally announced October 2023.
-
Resource Efficient Boolean Function Solver on Quantum Computer
Authors:
Xiang Li,
Hanxiang Shen,
Weiguo Gao,
Yingzhou Li
Abstract:
Nonlinear boolean equation systems play an important role in a wide range of applications. Grover's algorithm is one of the best-known quantum search algorithms in solving the nonlinear boolean equation system on quantum computers. In this paper, we propose three novel techniques to improve the efficiency under Grover's algorithm framework. A W-cycle circuit construction introduces a recursive ide…
▽ More
Nonlinear boolean equation systems play an important role in a wide range of applications. Grover's algorithm is one of the best-known quantum search algorithms in solving the nonlinear boolean equation system on quantum computers. In this paper, we propose three novel techniques to improve the efficiency under Grover's algorithm framework. A W-cycle circuit construction introduces a recursive idea to increase the solvable number of boolean equations given a fixed number of qubits. Then, a greedy compression technique is proposed to reduce the oracle circuit depth. Finally, a randomized Grover's algorithm randomly chooses a subset of equations to form a random oracle every iteration, which further reduces the circuit depth and the number of ancilla qubits. Numerical results on boolean quadratic equations demonstrate the efficiency of the proposed techniques.
△ Less
Submitted 8 October, 2023;
originally announced October 2023.
-
Measurement of $e^{+}e^{-}\rightarrowηJ/ψ$ Cross Section from $\sqrt{s}=$ 3.808 GeV to 4.951 GeV
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (608 additional authors not shown)
Abstract:
Using data samples with an integrated luminosity of 22.42 fb$^{-1}$ collected by the BESIII detector operating at the BEPCII storage ring, we measure the cross sections of the $e^{+}e^{-}\rightarrow\etaJ/ψ$ process at center-of-mass energies from 3.808 to 4.951 GeV. Three structures are observed in the line shape of the measured cross sections. A maximum-likelihood fit with $ψ(4040)$, two addition…
▽ More
Using data samples with an integrated luminosity of 22.42 fb$^{-1}$ collected by the BESIII detector operating at the BEPCII storage ring, we measure the cross sections of the $e^{+}e^{-}\rightarrow\etaJ/ψ$ process at center-of-mass energies from 3.808 to 4.951 GeV. Three structures are observed in the line shape of the measured cross sections. A maximum-likelihood fit with $ψ(4040)$, two additional resonances, and a non-resonant component is performed. The mass and width of the first additional state are $(4219.7\pm2.5\pm4.5) \rm{MeV}/\rm{c}^2$ and $(80.7\pm4.4\pm1.4) \rm{MeV}$, respectively, consistent with the $ψ(4230)$. For the second state, the mass and width are $(4386\pm13\pm17) \rm{MeV}/\rm{c}^2$ and $(177\pm32\pm13) \rm{MeV}$, respectively, consistent with the $ψ(4360)$. The first uncertainties are statistical and the second ones are systematic. The statistical significance of $ψ(4040)$ is $8.0σ$ and those for $ψ(4230)$ and $ψ(4360)$ are more than $10.0σ$.
△ Less
Submitted 5 October, 2023;
originally announced October 2023.
-
Concurrent spin squeezing and light squeezing in an atomic ensemble
Authors:
Shenchao **,
Junlei Duan,
Youwei Zhang,
Xichang Zhang,
Han Bao,
Heng Shen,
Liantuan Xiao,
Suotang Jia,
Mingfeng Wang,
Yanhong Xiao
Abstract:
Squeezed spin states and squeezed light are both key resources for quantum metrology and quantum information science, but have largely been separately investigated experimentally so far. Simultaneous generation of these two types of quantum states in one experiment setup is an intriguing goal, and could also enable the study of the analogies and distinctions between atoms and light from a new pers…
▽ More
Squeezed spin states and squeezed light are both key resources for quantum metrology and quantum information science, but have largely been separately investigated experimentally so far. Simultaneous generation of these two types of quantum states in one experiment setup is an intriguing goal, and could also enable the study of the analogies and distinctions between atoms and light from a new perspective. Here we report an experimental demonstration of concurrent spin squeezing and light squeezing in a hot atomic ensemble, by judiciously engineering a symmetric atom-light interaction Hamiltonian. The squeezing process is deterministic, yielding fixed squeezing directions for the light field and the collective atomic spin. Furthermore, the squeezed light modes lie in the multiple frequency sidebands of a single spatial mode. This novel type of dual squeezed state may be a promising resource for quantum information science and technologies. Our method can be extended to other quantum platforms such as optomechanical and cold atom systems.
△ Less
Submitted 3 October, 2023;
originally announced October 2023.
-
First measurement of $ΛN$ inelastic scattering with $Λ$ from $e^{+} e^{-} \rightarrow J/ψ\to Λ\barΛ$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (626 additional authors not shown)
Abstract:
Using an $e^+ e^-$ collision data sample of $(10087 \pm 44)\times10^6 ~J/ψ$ events taken at the center-of-mass energy of $3.097~\rm{GeV}$ by the BESIII detector at the BEPCII collider, the process $Λ+N \rightarrow Σ^+ + X$ is studied for the first time employing a novel method. The $Σ^{+}$ hyperons are produced by the collisions of $Λ$ hyperons from $J/ψ$ decays with nuclei in the material of the…
▽ More
Using an $e^+ e^-$ collision data sample of $(10087 \pm 44)\times10^6 ~J/ψ$ events taken at the center-of-mass energy of $3.097~\rm{GeV}$ by the BESIII detector at the BEPCII collider, the process $Λ+N \rightarrow Σ^+ + X$ is studied for the first time employing a novel method. The $Σ^{+}$ hyperons are produced by the collisions of $Λ$ hyperons from $J/ψ$ decays with nuclei in the material of the BESIII detector. The total cross section of $Λ+ ^{9}{\rm Be} \rightarrow Σ^+ + X$ is measured to be $σ= (37.3 \pm 4.7 \pm 3.5)~{\rm mb}$ at $Λ$ beam momenta within $[1.057, 1.091]~{\rm GeV}/c$, where the uncertainties are statistical and systematic, respectively. This analysis is the first study of $Λ$-nucleon interactions at an $e^+ e^-$ collider, providing information and constraints relevant for the strong-interaction potential, the origin of color confinement, the unified model for baryon-baryon interactions, and the internal structure of neutron stars.
△ Less
Submitted 1 October, 2023;
originally announced October 2023.
-
Reinforcement Learning Based Neighbour Selection for VANET with Adaptive Trust Management
Authors:
Orvila Sarker,
Hong Shen,
M. Ali Babar
Abstract:
Successful information propagation from source to destination in Vehicular Adhoc Network (VANET) can be hampered by the presence of neighbouring attacker nodes causing unwanted packet drop**. Potential attackers change their behaviour over time and remain undetected due to the ad-hoc nature of VANET. Capturing the dynamic attacker behaviour and updating the corresponding neighbourhood informatio…
▽ More
Successful information propagation from source to destination in Vehicular Adhoc Network (VANET) can be hampered by the presence of neighbouring attacker nodes causing unwanted packet drop**. Potential attackers change their behaviour over time and remain undetected due to the ad-hoc nature of VANET. Capturing the dynamic attacker behaviour and updating the corresponding neighbourhood information without compromising the quality of service requirements is an ongoing challenge. This work proposes a Reinforcement Learning (RL) based neighbour selection framework for VANET with an adaptive trust management system to capture the behavioural changes of potential attackers and to dynamically update the neighbourhood information. In contrast to existing works, we consider trust and link-life time in unison as neighbour selection criteria to achieve trustworthy communication. Our adaptive trust model takes into account the social relationship, time and confidence in trust observation to avoid four types of attackers. To update the neighbourhood information, our framework sets the learning rate of the RL agent according to the velocities of the neighbour nodes to improve the model's adaptability to network topology changes. Results demonstrate that our method can take less number of hops to the destination for large network sizes while can response is up to 54 percent faster compared to a baseline method. Also, the proposed model can outperform the other baseline method by reducing the packet drop** rate up to 57 percent caused by the attacker.
△ Less
Submitted 1 October, 2023;
originally announced October 2023.
-
FedAIoT: A Federated Learning Benchmark for Artificial Intelligence of Things
Authors:
Samiul Alam,
Tuo Zhang,
Tiantian Feng,
Hui Shen,
Zhichao Cao,
Dong Zhao,
JeongGil Ko,
Kiran Somasundaram,
Shrikanth S. Narayanan,
Salman Avestimehr,
Mi Zhang
Abstract:
There is a significant relevance of federated learning (FL) in the realm of Artificial Intelligence of Things (AIoT). However, most existing FL works do not use datasets collected from authentic IoT devices and thus do not capture unique modalities and inherent challenges of IoT data. To fill this critical gap, in this work, we introduce FedAIoT, an FL benchmark for AIoT. FedAIoT includes eight da…
▽ More
There is a significant relevance of federated learning (FL) in the realm of Artificial Intelligence of Things (AIoT). However, most existing FL works do not use datasets collected from authentic IoT devices and thus do not capture unique modalities and inherent challenges of IoT data. To fill this critical gap, in this work, we introduce FedAIoT, an FL benchmark for AIoT. FedAIoT includes eight datasets collected from a wide range of IoT devices. These datasets cover unique IoT modalities and target representative applications of AIoT. FedAIoT also includes a unified end-to-end FL framework for AIoT that simplifies benchmarking the performance of the datasets. Our benchmark results shed light on the opportunities and challenges of FL for AIoT. We hope FedAIoT could serve as an invaluable resource to foster advancements in the important field of FL for AIoT. The repository of FedAIoT is maintained at https://github.com/AIoT-MLSys-Lab/FedAIoT.
△ Less
Submitted 19 June, 2024; v1 submitted 29 September, 2023;
originally announced October 2023.
-
Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval
Authors:
Hao Li,
**gkuan Song,
Lianli Gao,
Xiaosu Zhu,
Heng Tao Shen
Abstract:
Cross-modal Retrieval methods build similarity relations between vision and language modalities by jointly learning a common representation space. However, the predictions are often unreliable due to the Aleatoric uncertainty, which is induced by low-quality data, e.g., corrupt images, fast-paced videos, and non-detailed texts. In this paper, we propose a novel Prototype-based Aleatoric Uncertaint…
▽ More
Cross-modal Retrieval methods build similarity relations between vision and language modalities by jointly learning a common representation space. However, the predictions are often unreliable due to the Aleatoric uncertainty, which is induced by low-quality data, e.g., corrupt images, fast-paced videos, and non-detailed texts. In this paper, we propose a novel Prototype-based Aleatoric Uncertainty Quantification (PAU) framework to provide trustworthy predictions by quantifying the uncertainty arisen from the inherent data ambiguity. Concretely, we first construct a set of various learnable prototypes for each modality to represent the entire semantics subspace. Then Dempster-Shafer Theory and Subjective Logic Theory are utilized to build an evidential theoretical framework by associating evidence with Dirichlet Distribution parameters. The PAU model induces accurate uncertainty and reliable predictions for cross-modal retrieval. Extensive experiments are performed on four major benchmark datasets of MSR-VTT, MSVD, DiDeMo, and MS-COCO, demonstrating the effectiveness of our method. The code is accessible at https://github.com/leolee99/PAU.
△ Less
Submitted 14 January, 2024; v1 submitted 29 September, 2023;
originally announced September 2023.
-
Updated measurements of the M1 transition $ψ(3686) \to γη_{c}(2S)$ with $η_{c}(2S) \to K \bar{K} π$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (609 additional authors not shown)
Abstract:
Based on a data sample of $(27.08 \pm 0.14 ) \times 10^8~ψ(3686)$ events collected with the BESIII detector at the BEPCII collider, the M1 transition $ψ(3686) \to γη_{c}(2S)$ with $η_{c}(2S) \to K\bar{K}π$ is studied, where $K\bar{K}π$ is $K^{+} K^{-} π^{0}$ or $K_{S}^{0}K^{\pm}π^{\mp}$. The mass and width of the $η_{c}(2S)$ are measured to be $(3637.8 \pm 0.8 (\rm {stat}) \pm 0.2 (\rm {syst}))$ M…
▽ More
Based on a data sample of $(27.08 \pm 0.14 ) \times 10^8~ψ(3686)$ events collected with the BESIII detector at the BEPCII collider, the M1 transition $ψ(3686) \to γη_{c}(2S)$ with $η_{c}(2S) \to K\bar{K}π$ is studied, where $K\bar{K}π$ is $K^{+} K^{-} π^{0}$ or $K_{S}^{0}K^{\pm}π^{\mp}$. The mass and width of the $η_{c}(2S)$ are measured to be $(3637.8 \pm 0.8 (\rm {stat}) \pm 0.2 (\rm {syst}))$ MeV/$c^{2}$ and $(10.5 \pm 1.7 (\rm {stat}) \pm 3.5 (\rm {syst}))$ MeV, respectively. The product branching fraction $\mathcal{B}\left(ψ(3686) \rightarrow γη_{c}(2 S)\right) \times \mathcal{B}(η_{c}(2 S) \rightarrow K \bar{K} π)$ is determined to be $(0.97 \pm 0.06 (\rm {stat}) \pm 0.09 (\rm {syst})) \times 10^{-5}$. Using $\mathcal{BR}(η_{c}(2S)\to K\bar{K}π)=(1.86^{+0.68}_{-0.49})\%$, we obtain the branching fraction of the radiative transition to be $\mathcal{BR}(ψ(3686) \to γη_{c}(2S)) = (5.2 \pm 0.3 (\rm {stat}) \pm 0.5 (\rm {syst}) ^{+1.9}_{-1.4} (extr)) \times 10^{-4}$, where the third uncertainty is due to the quoted $\mathcal{BR}(η_{c}(2S) \to K\bar{K}π)$.
△ Less
Submitted 26 September, 2023;
originally announced September 2023.
-
Investigation of the $ΔI = 1/2$ rule and test of CP violation through the measurement of decay asymmetry parameters in $Ξ^-$ decays
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann
, et al. (604 additional authors not shown)
Abstract:
Using $(10087\pm44)\times 10^{6}$ $J/ψ$ events collected with the BESIII detector, numerous $Ξ^-$ and $Λ$ decay asymmetry parameters are simultaneously determined from the process $J/ψ\to Ξ^- \barΞ^+ \to Λ(pπ^-) π^- \barΛ(\bar{n} π^0) π^+$ and its charge-conjugate channel. The precisions of $α_0$ for $Λ\to nπ^0$ and $\barα_0$ for $\barΛ \to \bar{n}π^0$ compared to world averages are improved by fa…
▽ More
Using $(10087\pm44)\times 10^{6}$ $J/ψ$ events collected with the BESIII detector, numerous $Ξ^-$ and $Λ$ decay asymmetry parameters are simultaneously determined from the process $J/ψ\to Ξ^- \barΞ^+ \to Λ(pπ^-) π^- \barΛ(\bar{n} π^0) π^+$ and its charge-conjugate channel. The precisions of $α_0$ for $Λ\to nπ^0$ and $\barα_0$ for $\barΛ \to \bar{n}π^0$ compared to world averages are improved by factors of 4 and 1.7, respectively. The ratio of decay asymmetry parameters of $Λ\to nπ^0$ to that of $Λ\to pπ^-$, $\langle α_0 \rangle/ \langle α_{Λ-} \rangle $, is determined to be $ 0.873 \pm 0.012^{+0.011}_{-0.010}$, where the first and the second uncertainties are statistical and systematic, respectively. The ratio is smaller than unity more than $5σ$, which signifies the existence of the $ΔI = 3/2$ transition in $Λ$ for the first time. Beside, we test for CP violation in $Ξ^- \to Λπ^-$ and in $Λ\to n π^{0}$ with the best precision to date.
△ Less
Submitted 8 January, 2024; v1 submitted 26 September, 2023;
originally announced September 2023.
-
Efficient Post-training Quantization with FP8 Formats
Authors:
Haihao Shen,
Naveen Mellempudi,
Xin He,
Qun Gao,
Chang Wang,
Mengni Wang
Abstract:
Recent advances in deep learning methods such as LLMs and Diffusion models have created a need for improved quantization methods that can meet the computational demands of these modern architectures while maintaining accuracy. Towards this goal, we study the advantages of FP8 data formats for post-training quantization across 75 unique network architectures covering a wide range of tasks, includin…
▽ More
Recent advances in deep learning methods such as LLMs and Diffusion models have created a need for improved quantization methods that can meet the computational demands of these modern architectures while maintaining accuracy. Towards this goal, we study the advantages of FP8 data formats for post-training quantization across 75 unique network architectures covering a wide range of tasks, including machine translation, language modeling, text generation, image classification, generation, and segmentation. We examine three different FP8 representations (E5M2, E4M3, and E3M4) to study the effects of varying degrees of trade-off between dynamic range and precision on model accuracy. Based on our extensive study, we developed a quantization workflow that generalizes across different network architectures. Our empirical results show that FP8 formats outperform INT8 in multiple aspects, including workload coverage (92.64% vs. 65.87%), model accuracy and suitability for a broader range of operations. Furthermore, our findings suggest that E4M3 is better suited for NLP models, whereas E3M4 performs marginally better than E4M3 on computer vision tasks. The code is publicly available on Intel Neural Compressor: https://github.com/intel/neural-compressor.
△ Less
Submitted 31 March, 2024; v1 submitted 25 September, 2023;
originally announced September 2023.
-
Measurement of the $e^{+}e^{-} \to K_{S}^{0} K_{L}^{0} π^{0}$ cross sections from $\sqrt{s}=$ 2.000 to 3.080 GeV
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann
, et al. (604 additional authors not shown)
Abstract:
Based on $e^{+}e^{-}$ collision data collected at center-of-mass energies from 2.000 to 3.080 GeV by the BESIII detector at the BEPCII collider, a partial wave analysis is performed for the process $e^{+}e^{-}\to K_{S}^{0} K_{L}^{0} π^{0}$. The results allow the Born cross sections of the process $e^{+}e^{-}\to K_{S}^{0} K_{L}^{0} π^{0}$, as well as its subprocesses…
▽ More
Based on $e^{+}e^{-}$ collision data collected at center-of-mass energies from 2.000 to 3.080 GeV by the BESIII detector at the BEPCII collider, a partial wave analysis is performed for the process $e^{+}e^{-}\to K_{S}^{0} K_{L}^{0} π^{0}$. The results allow the Born cross sections of the process $e^{+}e^{-}\to K_{S}^{0} K_{L}^{0} π^{0}$, as well as its subprocesses $e^{+}e^{-}\to K^{*}(892)^{0}\bar{K}^{0}$ and $K^{*}_{2}(1430)^{0}\bar{K}^{0}$ to be measured. The Born cross sections for $e^{+}e^{-}\to K_{S}^{0}K_{L}^{0}π^{0}$ are consistent with previous measurements by BaBar, but with substantially improved precision. The Born cross section lineshape of the process $e^{+}e^{-}\to K^{*}(892)^{0}\bar{K}^{0}$ is consistent with a vector meson state around 2.2 GeV with a significance of 3.2$σ$. A Breit-Wigner fit determines its mass as $M_Y=(2164.7\pm9.1\pm3.1)~{\rm{MeV}}/c^{2}$ and its width as $Γ_{Y}=(32.4\pm21.0\pm1.8)~\rm{MeV}$.
△ Less
Submitted 26 February, 2024; v1 submitted 25 September, 2023;
originally announced September 2023.
-
A rigorous model reduction for the anisotropic-scattering transport process
Authors:
Yuan Hu,
Chang Liu,
Huayun Shen
Abstract:
In this letter, we propose a reduced-order model to bridge the particle transport mechanics and the macroscopic fluid dynamics in the highly scattered regime. A rigorous mathematical derivation and a concise physical interpretation are presented for an anisotropic-scattering transport process with arbitrary order of scattering kernel. The prediction of the theoretical model perfectly agrees with t…
▽ More
In this letter, we propose a reduced-order model to bridge the particle transport mechanics and the macroscopic fluid dynamics in the highly scattered regime. A rigorous mathematical derivation and a concise physical interpretation are presented for an anisotropic-scattering transport process with arbitrary order of scattering kernel. The prediction of the theoretical model perfectly agrees with the numerical experiments. A clear picture of the diffusion physics is revealed for the neutral particle transport in the asymptotic optically thick regime.
△ Less
Submitted 21 September, 2023;
originally announced September 2023.
-
DePT: Decoupled Prompt Tuning
Authors:
Ji Zhang,
Shihan Wu,
Lianli Gao,
Heng Tao Shen,
**gkuan Song
Abstract:
This work breaks through the Base-New Tradeoff (BNT)dilemma in prompt tuning, i.e., the better the tuned model generalizes to the base (or target) task, the worse it generalizes to new tasks, and vice versa. Specifically, through an in-depth analysis of the learned features of the base and new tasks, we observe that the BNT stems from a channel bias issue, i.e., the vast majority of feature channe…
▽ More
This work breaks through the Base-New Tradeoff (BNT)dilemma in prompt tuning, i.e., the better the tuned model generalizes to the base (or target) task, the worse it generalizes to new tasks, and vice versa. Specifically, through an in-depth analysis of the learned features of the base and new tasks, we observe that the BNT stems from a channel bias issue, i.e., the vast majority of feature channels are occupied by base-specific knowledge, resulting in the collapse of taskshared knowledge important to new tasks. To address this, we propose the Decoupled Prompt Tuning (DePT) framework, which decouples base-specific knowledge from feature channels into an isolated feature space during prompt tuning, so as to maximally preserve task-shared knowledge in the original feature space for achieving better zero-shot generalization on new tasks. Importantly, our DePT is orthogonal to existing prompt tuning methods, hence it can improve all of them. Extensive experiments on 11 datasets show the strong flexibility and effectiveness of DePT. Our code and pretrained models are available at https://github.com/Koorye/DePT.
△ Less
Submitted 19 March, 2024; v1 submitted 14 September, 2023;
originally announced September 2023.
-
Measurements of the absolute branching fractions of $Ω^-$ decays and test of the $ΔI = 1/2$ rule
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann
, et al. (599 additional authors not shown)
Abstract:
Based on a data set of $(27.12\pm0.10)\times 10^8$ $ψ(3686)$ events collected at the BESIII experiment, the absolute branching fractions of the three dominant $Ω^-$ decays are measured to be $\mathcal{B}_{Ω^- \to Ξ^0 π^-} = (25.03\pm0.44\pm0.53)\%$, $\mathcal{B}_{Ω^- \to Ξ^- π^0} = (8.43\pm0.52\pm0.28)\%$, and $\mathcal{B}_{Ω^- \to ΛK^-} = (66.3\pm0.8\pm2.0)\%$, where the first and second uncertai…
▽ More
Based on a data set of $(27.12\pm0.10)\times 10^8$ $ψ(3686)$ events collected at the BESIII experiment, the absolute branching fractions of the three dominant $Ω^-$ decays are measured to be $\mathcal{B}_{Ω^- \to Ξ^0 π^-} = (25.03\pm0.44\pm0.53)\%$, $\mathcal{B}_{Ω^- \to Ξ^- π^0} = (8.43\pm0.52\pm0.28)\%$, and $\mathcal{B}_{Ω^- \to ΛK^-} = (66.3\pm0.8\pm2.0)\%$, where the first and second uncertainties are statistical and systematic, respectively. The ratio between $\mathcal{B}_{Ω^- \to Ξ^0 π^-}$ and $\mathcal{B}_{Ω^- \to Ξ^- π^0}$ is determined to be $2.97\pm0.19\pm0.11$, which is in good agreement with the PDG value of $2.74\pm0.15$, but greater by more than four standard deviations than the theoretical prediction of 2 obtained from the $ΔI = 1/2$ rule.
△ Less
Submitted 12 September, 2023;
originally announced September 2023.
-
Observation of $D^{+}\to K_{S}^{0}a_{0}(980)^{+}$ in the amplitude analysis of $D^{+} \to K_{S}^{0}π^+η$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (604 additional authors not shown)
Abstract:
We perform for the first time an amplitude analysis of the decay $D^{+}\to K_{S}^{0}π^+η$ and report the observation of the decay $D^{+}\to K_{S}^{0}a_{0}(980)^{+}$ using 2.93 fb$^{-1}$ of $e^+e^-$ collision data taken at a center-of-mass energy of 3.773 GeV with the BESIII detector. As the only W-annihilation free decay among $D$ to $a_{0}(980)$-pseudoscalar, $D^{+}\to K_{S}^{0}a_{0}(980)^{+}$ is…
▽ More
We perform for the first time an amplitude analysis of the decay $D^{+}\to K_{S}^{0}π^+η$ and report the observation of the decay $D^{+}\to K_{S}^{0}a_{0}(980)^{+}$ using 2.93 fb$^{-1}$ of $e^+e^-$ collision data taken at a center-of-mass energy of 3.773 GeV with the BESIII detector. As the only W-annihilation free decay among $D$ to $a_{0}(980)$-pseudoscalar, $D^{+}\to K_{S}^{0}a_{0}(980)^{+}$ is the ideal decay to extract the contributions of the external and internal $W$-emission amplitudes involving $a_{0}(980)$ and study the final-state interactions. The absolute branching fraction of $D^{+}\to K_{S}^{0}π^+η$ is measured to be $(1.27\pm0.04_{\rm stat.}\pm0.03_{\rm syst.})\%$. The product branching fractions of $D^{+}\to K_{S}^{0}a_{0}(980)^{+}$ with $a_{0}(980)^{+}\to π^+η$ and $D^{+}\to π^+ K_0^*(1430)^0$ with $K_0^*(1430)^0\to K_{S}^{0}η$ are measured to be $(1.33\pm0.05_{\rm stat.}\pm0.04_{\rm syst.})\%$ and $(0.14\pm0.03_{\rm stat.}\pm0.01_{\rm syst.})\%$, respectively.
△ Less
Submitted 29 March, 2024; v1 submitted 11 September, 2023;
originally announced September 2023.
-
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs
Authors:
Wenhua Cheng,
Weiwei Zhang,
Haihao Shen,
Yiyang Cai,
Xin He,
Kaokao Lv,
Yi Liu
Abstract:
Large Language Models (LLMs) have demonstrated exceptional proficiency in language-related tasks, but their deployment poses significant challenges due to substantial memory and storage requirements. Weight-only quantization has emerged as a promising solution to address these challenges. Previous research suggests that fine-tuning through up and down rounding can enhance performance. In this stud…
▽ More
Large Language Models (LLMs) have demonstrated exceptional proficiency in language-related tasks, but their deployment poses significant challenges due to substantial memory and storage requirements. Weight-only quantization has emerged as a promising solution to address these challenges. Previous research suggests that fine-tuning through up and down rounding can enhance performance. In this study, we introduce SignRound, a method that utilizes signed gradient descent (SignSGD) to optimize rounding values and weight clip** within just 200 steps. SignRound integrates the advantages of Quantization-Aware Training (QAT) and Post-Training Quantization (PTQ), achieving exceptional results across 2 to 4 bits while maintaining low tuning costs and avoiding additional inference overhead. For example, SignRound achieves absolute average accuracy improvements ranging from 6.91\% to 33.22\% at 2 bits. It also demonstrates robust generalization to recent models and achieves near-lossless quantization in most scenarios at 4 bits. The source code is publicly available at \url{https://github.com/intel/auto-round}.
△ Less
Submitted 23 May, 2024; v1 submitted 11 September, 2023;
originally announced September 2023.
-
Observation of the Singly Cabibbo-Suppressed Decay $Λ_{c}^{+}\to Σ^{-}K^{+}π^{+}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (605 additional authors not shown)
Abstract:
The singly Cabibbo-suppressed decay $Λ_{c}^{+}\to Σ^{-}K^{+}π^{+}$ is observed for the first time with a statistical significance of $6.4σ$ by using 4.5 fb$^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies between 4.600 and 4.699 GeV with the BESIII detector at BEPCII. The absolute branching fraction of $Λ_{c}^{+}\to Σ^{-}K^{+}π^{+}$ is measured to be…
▽ More
The singly Cabibbo-suppressed decay $Λ_{c}^{+}\to Σ^{-}K^{+}π^{+}$ is observed for the first time with a statistical significance of $6.4σ$ by using 4.5 fb$^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies between 4.600 and 4.699 GeV with the BESIII detector at BEPCII. The absolute branching fraction of $Λ_{c}^{+}\to Σ^{-}K^{+}π^{+}$ is measured to be $(3.8\pm1.3_{\rm stat}\pm0.2_{\rm syst})\times 10^{-4}$ in a model-independent approach. This is the first observation of a Cabibbo-suppressed $Λ_{c}^{+}$ decay involving $Σ^-$ in the final state. The ratio of branching fractions between $Λ_{c}^{+}\to Σ^{-}K^{+}π^{+}$ and the Cabibbo-favored decay $Λ_{c}^{+}\to Σ^- π^+π^+$ is calculated to be $(0.4 \pm 0.1)s_{c}^{2}$, where $s_{c} \equiv \sinθ_c = 0.2248$ with $θ_c$ the Cabibbo mixing angle. This ratio significantly deviates from $1.0s_{c}^{2}$ and provides important information for the understanding of nonfactorization contributions in $Λ_{c}^{+}$ decays.
△ Less
Submitted 8 May, 2024; v1 submitted 11 September, 2023;
originally announced September 2023.
-
Measurement of the cross section of $e^+e^-\rightarrowΞ^{-}\barΞ^{+}$ at center-of-mass energies between 3.510 and 4.843 GeV
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann
, et al. (599 additional authors not shown)
Abstract:
Using $e^+e^-$ collision data corresponding to a total integrated luminosity of 12.9 $fb^{-1}$ collected with the BESIII detector at the BEPCII collider, the exclusive Born cross sections and the effective form factors of the reaction $e^+e^-\rightarrowΞ^{-}\barΞ^{+}$ are measured via the single baryon-tag method at 23 center-of-mass energies between 3.510 and 4.843 GeV. Evidence for the decay…
▽ More
Using $e^+e^-$ collision data corresponding to a total integrated luminosity of 12.9 $fb^{-1}$ collected with the BESIII detector at the BEPCII collider, the exclusive Born cross sections and the effective form factors of the reaction $e^+e^-\rightarrowΞ^{-}\barΞ^{+}$ are measured via the single baryon-tag method at 23 center-of-mass energies between 3.510 and 4.843 GeV. Evidence for the decay $ψ(3770)\rightarrowΞ^{-}\barΞ^{+}$ is observed with a significance of 4.5$σ$ by analyzing the measured cross sections together with earlier BESIII results. For the other charmonium(-like) states $ψ(4040)$, $ψ(4160)$, $Y(4230)$, $Y(4360)$, $ψ(4415)$, and $Y(4660)$, no significant signal of their decay to $Ξ^-\bar Ξ^+$ is found. For these states, upper limits of the products of the branching fraction and the electronic partial width at the 90% confidence level are provided.
△ Less
Submitted 30 November, 2023; v1 submitted 8 September, 2023;
originally announced September 2023.
-
Novel method to extract the femtometer structure of strange baryons using the vacuum polarization effect
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
M. Albrecht,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
Y. Bai,
O. Bakina,
R. Baldini Ferroli,
I. Balossino,
Y. Ban,
V. Batozskaya,
D. Becker,
K. Begzsuren,
N. Berger,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
J. Bloms,
A. Bortone,
I. Boyko
, et al. (560 additional authors not shown)
Abstract:
One of the fundamental goals of particle physics is to gain microscopic understanding of the strong interaction. Electromagnetic form factors quantify the structure of hadrons in terms of charge and magnetization distributions. While the nucleon structure has been investigated extensively, data on hyperons is still scarce. It has recently been demonstrated that electron-positron annihilations into…
▽ More
One of the fundamental goals of particle physics is to gain microscopic understanding of the strong interaction. Electromagnetic form factors quantify the structure of hadrons in terms of charge and magnetization distributions. While the nucleon structure has been investigated extensively, data on hyperons is still scarce. It has recently been demonstrated that electron-positron annihilations into hyperon-antihyperon pairs provide a powerful tools to investigate their inner structure. We present a novel method useful for hyperon-antihyperon pairs of different types which exploits the cross section enhancement due to the vacuum polarization effect at the $J/ψ$ resonance. Using the 10 billion $J/ψ$ events collected with the BESIII detector, this allows a thorough determination of the hyperon structure . The result is essentially a precise snapshot of a $\barΛΣ^0$~($Λ\barΣ^0$) pair in the making, encoded in the form factor ratio and the phase. Their values are measured to be $R = 0.860\pm0.029({\rm stat.})\pm0.010({\rm syst.})$, $ΔΦ_1=(1.011\pm0.094({\rm stat.})\pm0.010({\rm syst.}))~\rm rad$ for $\barΛΣ^0$ and $ΔΦ_2=(2.128\pm0.094({\rm stat.})\pm0.010({\rm syst.}))~\rm rad$ for $Λ\barΣ^0$, respectively. Furthermore, charge-parity (CP) breaking is investigated for the first time in this reaction and found to be consistent with CP symmetry.
△ Less
Submitted 8 September, 2023;
originally announced September 2023.
-
Search for the semileptonic decays $D^+_s \to K_1(1270)^0 e^+ν_e$ and $D^+_s \to b_1(1235)^0 e^+ν_e$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann
, et al. (601 additional authors not shown)
Abstract:
By analyzing 7.33\,fb$^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies between 4.128 and 4.226 GeV with the BESIII detector, we search for the semileptonic decays $D^+_s \to K_1(1270)^0 e^+ν_e$ and $D^+_s \to b_1(1235)^0 e^+ν_e$ for the first time. No significant signals are observed for either decay mode. The upper limits on the (product) branching fractions are determined t…
▽ More
By analyzing 7.33\,fb$^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies between 4.128 and 4.226 GeV with the BESIII detector, we search for the semileptonic decays $D^+_s \to K_1(1270)^0 e^+ν_e$ and $D^+_s \to b_1(1235)^0 e^+ν_e$ for the first time. No significant signals are observed for either decay mode. The upper limits on the (product) branching fractions are determined to be ${\mathcal B}[D^+_s \to K_1(1270)^0 e^+ν_e] < 4.1\times 10^{-4}$ and ${\mathcal B}[D^+_s \to b_1(1235)^0 e^+ν_e]\cdot {\mathcal B}[b_1(1235)^0\to ωπ^0] < 6.4\times 10^{-4}$ at 90\% confidence level.
△ Less
Submitted 7 September, 2023;
originally announced September 2023.
-
First Measurement of the Decay Asymmetry in the pure W-boson-exchange Decay $Λ_{c}^{+}\toΞ^{0}K^{+}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (618 additional authors not shown)
Abstract:
Based on $4.4~\text{fb}^{-1}$ of $e^{+}e^{-}$ annihilation data collected at the center-of-mass energies between $4.60$ and $4.70~\text{GeV}$ with the BESIII detector at the BEPCII collider, the pure \textit{W}-boson-exchange decay $Λ_{c}^{+}\toΞ^{0}K^{+}$ is studied with a full angular analysis. The corresponding decay asymmetry is measured for the first time to be…
▽ More
Based on $4.4~\text{fb}^{-1}$ of $e^{+}e^{-}$ annihilation data collected at the center-of-mass energies between $4.60$ and $4.70~\text{GeV}$ with the BESIII detector at the BEPCII collider, the pure \textit{W}-boson-exchange decay $Λ_{c}^{+}\toΞ^{0}K^{+}$ is studied with a full angular analysis. The corresponding decay asymmetry is measured for the first time to be $α_{Ξ^{0}K^{+}}=0.01\pm0.16({\rm stat.})\pm0.03({\rm syst.})$. This result reflects the non-interference effect between the $S$- and $P$-wave amplitudes. The phase shift between $S$- and $P$-wave amplitudes has two solutions, which are $δ_{p}-δ_{s}=-1.55\pm0.25({\rm stat.})\pm0.05({\rm syst.})~\text{rad}$ or $1.59\pm0.25({\rm stat.})\pm0.05({\rm syst.})~\text{rad}$.
△ Less
Submitted 20 January, 2024; v1 submitted 6 September, 2023;
originally announced September 2023.
-
Robust Recommender System: A Survey and Future Directions
Authors:
Kaike Zhang,
Qi Cao,
Fei Sun,
Yunfan Wu,
Shuchang Tao,
Huawei Shen,
Xueqi Cheng
Abstract:
With the rapid growth of information, recommender systems have become integral for providing personalized suggestions and overcoming information overload. However, their practical deployment often encounters "dirty" data, where noise or malicious information can lead to abnormal recommendations. Research on improving recommender systems' robustness against such dirty data has thus gained significa…
▽ More
With the rapid growth of information, recommender systems have become integral for providing personalized suggestions and overcoming information overload. However, their practical deployment often encounters "dirty" data, where noise or malicious information can lead to abnormal recommendations. Research on improving recommender systems' robustness against such dirty data has thus gained significant attention. This survey provides a comprehensive review of recent work on recommender systems' robustness. We first present a taxonomy to organize current techniques for withstanding malicious attacks and natural noise. We then explore state-of-the-art methods in each category, including fraudster detection, adversarial training, certifiable robust training against malicious attacks, and regularization, purification, self-supervised learning against natural noise. Additionally, we summarize evaluation metrics and common datasets used to assess robustness. We discuss robustness across varying recommendation scenarios and its interplay with other properties like accuracy, interpretability, privacy, and fairness. Finally, we delve into open issues and future research directions in this emerging field. Our goal is to equip readers with a holistic understanding of robust recommender systems and spotlight pathways for future research and development.
△ Less
Submitted 5 September, 2023;
originally announced September 2023.
-
A coupled-channel analysis of the $X(3872)$ lineshape with BESIII data
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann
, et al. (600 additional authors not shown)
Abstract:
We perform a study of the $X(3872)$ lineshape using the data samples of $e^+e^-\toγX(3872)$, $X(3872)\to D^0\bar{D}^0 π^0$ and $π^+π^- J/ψ$ collected with the BESIII detector. The effects of the coupled-channels and the off-shell $D^{*0}$ are included in the parameterization of the lineshape. The lineshape mass parameter is obtained to be $M_{X}=(3871.63\pm 0.13^{+0.06}_{-0.05})$ MeV. Two poles ar…
▽ More
We perform a study of the $X(3872)$ lineshape using the data samples of $e^+e^-\toγX(3872)$, $X(3872)\to D^0\bar{D}^0 π^0$ and $π^+π^- J/ψ$ collected with the BESIII detector. The effects of the coupled-channels and the off-shell $D^{*0}$ are included in the parameterization of the lineshape. The lineshape mass parameter is obtained to be $M_{X}=(3871.63\pm 0.13^{+0.06}_{-0.05})$ MeV. Two poles are found on the first and second Riemann sheets corresponding to the $D^{*0}\bar{D}^0$ branch cut. The pole location on the first sheet is much closer to the $D^{*0}\bar{D}^0$ threshold than the other, and is determined to be $7.04\pm0.15^{+0.07}_{-0.08}$ MeV above the $D^0\bar{D}^0π^0$ threshold with an imaginary part $-0.19\pm0.08^{+0.14}_{-0.19}$ MeV.
△ Less
Submitted 4 September, 2023;
originally announced September 2023.
-
Consistency of Lloyd's Algorithm Under Perturbations
Authors:
Dhruv Patel,
Hui Shen,
Shankar Bhamidi,
Yufeng Liu,
Vladas Pipiras
Abstract:
In the context of unsupervised learning, Lloyd's algorithm is one of the most widely used clustering algorithms. It has inspired a plethora of work investigating the correctness of the algorithm under various settings with ground truth clusters. In particular, in 2016, Lu and Zhou have shown that the mis-clustering rate of Lloyd's algorithm on $n$ independent samples from a sub-Gaussian mixture is…
▽ More
In the context of unsupervised learning, Lloyd's algorithm is one of the most widely used clustering algorithms. It has inspired a plethora of work investigating the correctness of the algorithm under various settings with ground truth clusters. In particular, in 2016, Lu and Zhou have shown that the mis-clustering rate of Lloyd's algorithm on $n$ independent samples from a sub-Gaussian mixture is exponentially bounded after $O(\log(n))$ iterations, assuming proper initialization of the algorithm. However, in many applications, the true samples are unobserved and need to be learned from the data via pre-processing pipelines such as spectral methods on appropriate data matrices. We show that the mis-clustering rate of Lloyd's algorithm on perturbed samples from a sub-Gaussian mixture is also exponentially bounded after $O(\log(n))$ iterations under the assumptions of proper initialization and that the perturbation is small relative to the sub-Gaussian noise. In canonical settings with ground truth clusters, we derive bounds for algorithms such as $k$-means$++$ to find good initializations and thus leading to the correctness of clustering via the main result. We show the implications of the results for pipelines measuring the statistical significance of derived clusters from data such as SigClust. We use these general results to derive implications in providing theoretical guarantees on the misclustering rate for Lloyd's algorithm in a host of applications, including high-dimensional time series, multi-dimensional scaling, and community detection for sparse networks via spectral clustering.
△ Less
Submitted 1 September, 2023;
originally announced September 2023.
-
Observation of a vector charmoniumlike state at 4.7 ${\rm GeV}/c^2$ and search for $Z_{cs}$ in $e^+e^-\to K^+K^-J/ψ$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann
, et al. (599 additional authors not shown)
Abstract:
Using data samples with an integrated luminosity of 5.85~fb$^{-1}$ collected at center-of-mass energies from 4.61 to 4.95 GeV with the BESIII detector operating at the BEPCII storage ring, we measure the cross section for the process $e^+e^-\to K^+K^-J/ψ$. A new resonance with a mass of $M = 4708_{-15}^{+17}\pm21$ MeV/$c^{2}$ and a width of $Γ= 126_{-23}^{+27}\pm30$ MeV is observed in the energy-d…
▽ More
Using data samples with an integrated luminosity of 5.85~fb$^{-1}$ collected at center-of-mass energies from 4.61 to 4.95 GeV with the BESIII detector operating at the BEPCII storage ring, we measure the cross section for the process $e^+e^-\to K^+K^-J/ψ$. A new resonance with a mass of $M = 4708_{-15}^{+17}\pm21$ MeV/$c^{2}$ and a width of $Γ= 126_{-23}^{+27}\pm30$ MeV is observed in the energy-dependent line shape of the $e^+e^-\to K^+K^-J/ψ$ cross section with a significance over $5σ$. The $K^{+}J/ψ$ system is also investigated to search for charged charmoniumlike states, but no significant $Z_{cs}^+$ states are observed. Upper limits on the Born cross sections for $e^+e^-\to K^{-} Z_{cs}(3985)^{+}/K^{-} Z_{cs}(4000)^{+} + c.c.$ with $Z_{cs}(3985)^{\pm}/Z_{cs}(4000)^{\pm}\to K^{\pm} J/ψ$ are reported at 90\% confidence levels. The ratio of branching fractions $\frac{\mathcal{B}(Z_{cs}(3985)^{+}\to K^+ J/ψ)}{\mathcal{B}(Z_{cs}(3985)^{+}\to (\bar{D}^{0}D_s^{*+} + \bar{D}^{*0}D_s^+))}$ is measured to be less than 0.03 at 90\% confidence level.
△ Less
Submitted 24 November, 2023; v1 submitted 29 August, 2023;
originally announced August 2023.
-
MSFlow: Multi-Scale Flow-based Framework for Unsupervised Anomaly Detection
Authors:
Yixuan Zhou,
Xing Xu,
**gkuan Song,
Fumin Shen,
Heng Tao Shen
Abstract:
Unsupervised anomaly detection (UAD) attracts a lot of research interest and drives widespread applications, where only anomaly-free samples are available for training. Some UAD applications intend to further locate the anomalous regions without any anomaly information.
Although the absence of anomalous samples and annotations deteriorates the UAD performance, an inconspicuous yet powerful stati…
▽ More
Unsupervised anomaly detection (UAD) attracts a lot of research interest and drives widespread applications, where only anomaly-free samples are available for training. Some UAD applications intend to further locate the anomalous regions without any anomaly information.
Although the absence of anomalous samples and annotations deteriorates the UAD performance, an inconspicuous yet powerful statistics model, the normalizing flows, is appropriate for anomaly detection and localization in an unsupervised fashion. The flow-based probabilistic models, only trained on anomaly-free data, can efficiently distinguish unpredictable anomalies by assigning them much lower likelihoods than normal data.
Nevertheless, the size variation of unpredictable anomalies introduces another inconvenience to the flow-based methods for high-precision anomaly detection and localization. To generalize the anomaly size variation, we propose a novel Multi-Scale Flow-based framework dubbed MSFlow composed of asymmetrical parallel flows followed by a fusion flow to exchange multi-scale perceptions. Moreover, different multi-scale aggregation strategies are adopted for image-wise anomaly detection and pixel-wise anomaly localization according to the discrepancy between them. The proposed MSFlow is evaluated on three anomaly detection datasets, significantly outperforming existing methods. Notably, on the challenging MVTec AD benchmark, our MSFlow achieves a new state-of-the-art with a detection AUORC score of up to 99.7%, localization AUCROC score of 98.8%, and PRO score of 97.1%. The reproducible code is available at https://github.com/cool-xuan/msflow.
△ Less
Submitted 29 August, 2023;
originally announced August 2023.
-
Study of excited $Ξ$ states in $ψ(3686)\rightarrow{}K^{-}Λ\overlineΞ^{+}+c.c.$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
J. Bloms,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann
, et al. (587 additional authors not shown)
Abstract:
Based on a sample of $(448.1\pm2.9)\times10^{6}$ $ψ(3686)$ events collected with the BESIII detector at BEPCII, the decays of $ψ(3686)\to{}K^{-}Λ\overlineΞ^{+} + c.c.$ with $\overlineΞ^+ \to \overlineΛ π^+$, $\overlineΛ\to \overline{p} π^+$ are studied.Two excited hyperons, $Ξ(1690)^-$ and $Ξ(1820)^-$, are observed with large significance ($ \gg 10 σ$) in the $K^{-}Λ$ invariant mass distributions.…
▽ More
Based on a sample of $(448.1\pm2.9)\times10^{6}$ $ψ(3686)$ events collected with the BESIII detector at BEPCII, the decays of $ψ(3686)\to{}K^{-}Λ\overlineΞ^{+} + c.c.$ with $\overlineΞ^+ \to \overlineΛ π^+$, $\overlineΛ\to \overline{p} π^+$ are studied.Two excited hyperons, $Ξ(1690)^-$ and $Ξ(1820)^-$, are observed with large significance ($ \gg 10 σ$) in the $K^{-}Λ$ invariant mass distributions. A partial wave analysis is performed, and the spin-parities of $Ξ(1690)^-$ and $Ξ(1820)^-$ are determined to be $\frac{1}{2}^{-}$ and $\frac{3}{2}^{-}$, respectively. The masses, widths, and product branching fractions of $Ξ(1690)^-$ and $Ξ(1820)^-$ are also measured.
△ Less
Submitted 28 April, 2024; v1 submitted 29 August, 2023;
originally announced August 2023.
-
Evaluating the Robustness to Instructions of Large Language Models
Authors:
Yuansheng Ni,
Sichao Jiang,
Xinyu wu,
Hui Shen,
Yuli Zhou
Abstract:
Recently, Instruction fine-tuning has risen to prominence as a potential method for enhancing the zero-shot capabilities of Large Language Models (LLMs) on novel tasks. This technique has shown an exceptional ability to boost the performance of moderately sized LLMs, sometimes even reaching performance levels comparable to those of much larger model variants. The focus is on the robustness of inst…
▽ More
Recently, Instruction fine-tuning has risen to prominence as a potential method for enhancing the zero-shot capabilities of Large Language Models (LLMs) on novel tasks. This technique has shown an exceptional ability to boost the performance of moderately sized LLMs, sometimes even reaching performance levels comparable to those of much larger model variants. The focus is on the robustness of instruction-tuned LLMs to seen and unseen tasks. We conducted an exploration of six models including Alpaca, Vicuna, WizardLM, and Traditional Task-oriented Models(Flan-T5-XL/XXL, T0++) using real-world relation extraction datasets as case studies. We carried out a comprehensive evaluation of these instruction-following LLMs which have been tuned based on open-domain instructions and task-oriented instructions. The main discussion is their performance and robustness towards instructions. We have observed that in most cases, the model's performance in dealing with unfamiliar instructions tends to worsen significantly, and the robustness of the model for RE instructions deteriorates compared to QA. Further, we discovered that up until a certain parameter size threshold (3B), the performance of the FLAN-T5 model improves as the parameter count increases. The robustness of different scales of FLAN-T5 models to RE instruction is worse than the robustness to QA instruction.
△ Less
Submitted 27 November, 2023; v1 submitted 28 August, 2023;
originally announced August 2023.
-
Cross-Modal Retrieval: A Systematic Review of Methods and Future Directions
Authors:
Fengling Li,
Lei Zhu,
Tianshi Wang,
**g**g Li,
Zheng Zhang,
Heng Tao Shen
Abstract:
With the exponential surge in diverse multi-modal data, traditional uni-modal retrieval methods struggle to meet the needs of users demanding access to data from various modalities. To address this, cross-modal retrieval has emerged, enabling interaction across modalities, facilitating semantic matching, and leveraging complementarity and consistency between different modal data. Although prior li…
▽ More
With the exponential surge in diverse multi-modal data, traditional uni-modal retrieval methods struggle to meet the needs of users demanding access to data from various modalities. To address this, cross-modal retrieval has emerged, enabling interaction across modalities, facilitating semantic matching, and leveraging complementarity and consistency between different modal data. Although prior literature undertook a review of the cross-modal retrieval field, it exhibits numerous deficiencies pertaining to timeliness, taxonomy, and comprehensiveness. This paper conducts a comprehensive review of cross-modal retrieval's evolution, spanning from shallow statistical analysis techniques to vision-language pre-training models. Commencing with a comprehensive taxonomy grounded in machine learning paradigms, mechanisms, and models, the paper then delves deeply into the principles and architectures underpinning existing cross-modal retrieval methods. Furthermore, it offers an overview of widely used benchmarks, metrics, and performances. Lastly, the paper probes the prospects and challenges that confront contemporary cross-modal retrieval, while engaging in a discourse on potential directions for further progress in the field. To facilitate the research on cross-modal retrieval, we develop an open-source code repository at https://github.com/BMC-SDNU/Cross-Modal-Retrieval.
△ Less
Submitted 26 October, 2023; v1 submitted 27 August, 2023;
originally announced August 2023.
-
Search for the light hadron decay $χ_{c1}(3872) \to π^{+}π^{-}η$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann
, et al. (600 additional authors not shown)
Abstract:
With a data sample corresponding to an integrated luminosity of 11.5~fb$^{-1}$
collected with the BESIII detector operating at the BEPCII storage ring, for the first time the light hadron decay $χ_{c1}(3872) \rightarrow π^{+}π^{-}η$
is searched for. While no significant signal is observed, the upper limits at the 90\% confidence level for…
▽ More
With a data sample corresponding to an integrated luminosity of 11.5~fb$^{-1}$
collected with the BESIII detector operating at the BEPCII storage ring, for the first time the light hadron decay $χ_{c1}(3872) \rightarrow π^{+}π^{-}η$
is searched for. While no significant signal is observed, the upper limits at the 90\% confidence level for
$σ[e^{+}e^{-} \rightarrow γχ_{c1}(3872)] \mathcal{B}[χ_{c1}(3872) \rightarrow π^{+}π^{-}η]$ at center-of-mass energies from 4.13 to 4.34 GeV are determined.
By normalizing to the $χ_{c1}(3872)\toπ^+π^- J/ψ$ decay channel, a 90\% confidence level upper limit for the branching fraction ratio
$\mathcal{R}=\mathcal{B}[χ_{c1}(3872) \rightarrowπ^{+}π^{-}η]/\mathcal{B}[χ_{c1}(3872) \rightarrow π^{+}π^{-} J/ψ] < 0.12$ is given.
These measurements provide important inputs for understanding the internal structure of the $χ_{c1}(3872)$ resonance.
△ Less
Submitted 19 January, 2024; v1 submitted 26 August, 2023;
originally announced August 2023.
-
Improved measurement of the branching fractions for $J/ψ\toγπ^0$, $γη$ and $γη^\prime$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann
, et al. (598 additional authors not shown)
Abstract:
Using a data sample of $(1.0087\pm 0.0044)\times 10^{10}$ $J/ψ$ events collected with the BESIII detector, the decays of $J/ψ\toγπ^{0} (η, η^\prime)\toγγγ$ are studied. Newly measured branching fractions are $\mathcal{B}$$(J/ψ\toγπ^{0})$=$(3.34\pm 0.02\pm 0.09)\times 10^{-5}$, $\mathcal{B}$$(J/ψ\toγη)$=$(1.096\pm 0.001\pm0.019)\times 10^{-3}$ and $\mathcal{B}$$(J/ψ\toγη^\prime)$=…
▽ More
Using a data sample of $(1.0087\pm 0.0044)\times 10^{10}$ $J/ψ$ events collected with the BESIII detector, the decays of $J/ψ\toγπ^{0} (η, η^\prime)\toγγγ$ are studied. Newly measured branching fractions are $\mathcal{B}$$(J/ψ\toγπ^{0})$=$(3.34\pm 0.02\pm 0.09)\times 10^{-5}$, $\mathcal{B}$$(J/ψ\toγη)$=$(1.096\pm 0.001\pm0.019)\times 10^{-3}$ and $\mathcal{B}$$(J/ψ\toγη^\prime)$=$(5.40\pm 0.01\pm0.11)\times 10^{-3}$, where the first uncertainties are statistical and the second are systematic. These results are consistent with the world average values within two standard deviations. The ratio of partial widths $Γ(J/ψ\toγη^\prime)/Γ(J/ψ\toγη)$ is measured to be $4.93 \pm 0.13$. The singlet-octet pseudoscalar mixing angle $θ_P$ is determined to be $θ_P = -(22.11 \pm0.26)^\circ$ or $-(19.34 \pm 0.34)^\circ$ with two different phenomenological models.
△ Less
Submitted 25 August, 2023;
originally announced August 2023.
-
Significant-attributed Community Search in Heterogeneous Information Networks
Authors:
Yanghao Liu,
Fangda Guo,
Bingbing Xu,
Peng Bao,
Huawei Shen,
Xueqi Cheng
Abstract:
Community search is a personalized community discovery problem aimed at finding densely-connected subgraphs containing the query vertex. In particular, the search for communities with high-importance vertices has recently received a great deal of attention. However, existing works mainly focus on conventional homogeneous networks where vertices are of the same type, but are not applicable to heter…
▽ More
Community search is a personalized community discovery problem aimed at finding densely-connected subgraphs containing the query vertex. In particular, the search for communities with high-importance vertices has recently received a great deal of attention. However, existing works mainly focus on conventional homogeneous networks where vertices are of the same type, but are not applicable to heterogeneous information networks (HINs) composed of multi-typed vertices and different semantic relations, such as bibliographic networks. In this paper, we study the problem of high-importance community search in HINs. A novel community model is introduced, named heterogeneous significant community (HSC), to unravel the closely connected vertices of the same type with high attribute values through multiple semantic relationships. An HSC not only maximizes the exploration of indirect relationships across entities of the anchor-type but incorporates their significance. To search the HSCs, we first develop online algorithms by exploiting both segmented-based meta-path expansion and significance increment. Specially, a solution space reuse strategy based on structural nesting is designed to boost the efficiency. In addition, we further devise a two-level index to support searching HSCs in optimal time, based on which a space-efficient compact index is proposed. Extensive experiments on real-world large-scale HINs demonstrate that our solutions are effective and efficient for searching HSCs, and the index-based algorithms are 2-4 orders of magnitude faster than online algorithms.
△ Less
Submitted 25 August, 2023;
originally announced August 2023.
-
CIParsing: Unifying Causality Properties into Multiple Human Parsing
Authors:
Xiaojia Chen,
Xuanhan Wang,
Lianli Gao,
Beitao Chen,
**gkuan Song,
HenTao Shen
Abstract:
Existing methods of multiple human parsing (MHP) apply statistical models to acquire underlying associations between images and labeled body parts. However, acquired associations often contain many spurious correlations that degrade model generalization, leading statistical models to be vulnerable to visually contextual variations in images (e.g., unseen image styles/external interventions). To ta…
▽ More
Existing methods of multiple human parsing (MHP) apply statistical models to acquire underlying associations between images and labeled body parts. However, acquired associations often contain many spurious correlations that degrade model generalization, leading statistical models to be vulnerable to visually contextual variations in images (e.g., unseen image styles/external interventions). To tackle this, we present a causality inspired parsing paradigm termed CIParsing, which follows fundamental causal principles involving two causal properties for human parsing (i.e., the causal diversity and the causal invariance). Specifically, we assume that an input image is constructed by a mix of causal factors (the characteristics of body parts) and non-causal factors (external contexts), where only the former ones cause the generation process of human parsing.Since causal/non-causal factors are unobservable, a human parser in proposed CIParsing is required to construct latent representations of causal factors and learns to enforce representations to satisfy the causal properties. In this way, the human parser is able to rely on causal factors w.r.t relevant evidence rather than non-causal factors w.r.t spurious correlations, thus alleviating model degradation and yielding improved parsing ability. Notably, the CIParsing is designed in a plug-and-play fashion and can be integrated into any existing MHP models. Extensive experiments conducted on two widely used benchmarks demonstrate the effectiveness and generalizability of our method.
△ Less
Submitted 23 August, 2023;
originally announced August 2023.
-
Unlocking Accuracy and Fairness in Differentially Private Image Classification
Authors:
Leonard Berrada,
Soham De,
Judy Hanwen Shen,
Jamie Hayes,
Robert Stanforth,
David Stutz,
Pushmeet Kohli,
Samuel L. Smith,
Borja Balle
Abstract:
Privacy-preserving machine learning aims to train models on private data without leaking sensitive information. Differential privacy (DP) is considered the gold standard framework for privacy-preserving training, as it provides formal privacy guarantees. However, compared to their non-private counterparts, models trained with DP often have significantly reduced accuracy. Private classifiers are al…
▽ More
Privacy-preserving machine learning aims to train models on private data without leaking sensitive information. Differential privacy (DP) is considered the gold standard framework for privacy-preserving training, as it provides formal privacy guarantees. However, compared to their non-private counterparts, models trained with DP often have significantly reduced accuracy. Private classifiers are also believed to exhibit larger performance disparities across subpopulations, raising fairness concerns. The poor performance of classifiers trained with DP has prevented the widespread adoption of privacy preserving machine learning in industry. Here we show that pre-trained foundation models fine-tuned with DP can achieve similar accuracy to non-private classifiers, even in the presence of significant distribution shifts between pre-training data and downstream tasks. We achieve private accuracies within a few percent of the non-private state of the art across four datasets, including two medical imaging benchmarks. Furthermore, our private medical classifiers do not exhibit larger performance disparities across demographic groups than non-private models. This milestone to make DP training a practical and reliable technology has the potential to widely enable machine learning practitioners to train safely on sensitive datasets while protecting individuals' privacy.
△ Less
Submitted 21 August, 2023;
originally announced August 2023.
-
From Global to Local: Multi-scale Out-of-distribution Detection
Authors:
Ji Zhang,
Lianli Gao,
Bingguang Hao,
Hao Huang,
**gkuan Song,
Hengtao Shen
Abstract:
Out-of-distribution (OOD) detection aims to detect "unknown" data whose labels have not been seen during the in-distribution (ID) training process. Recent progress in representation learning gives rise to distance-based OOD detection that recognizes inputs as ID/OOD according to their relative distances to the training data of ID classes. Previous approaches calculate pairwise distances relying on…
▽ More
Out-of-distribution (OOD) detection aims to detect "unknown" data whose labels have not been seen during the in-distribution (ID) training process. Recent progress in representation learning gives rise to distance-based OOD detection that recognizes inputs as ID/OOD according to their relative distances to the training data of ID classes. Previous approaches calculate pairwise distances relying only on global image representations, which can be sub-optimal as the inevitable background clutter and intra-class variation may drive image-level representations from the same ID class far apart in a given representation space. In this work, we overcome this challenge by proposing Multi-scale OOD DEtection (MODE), a first framework leveraging both global visual information and local region details of images to maximally benefit OOD detection. Specifically, we first find that existing models pretrained by off-the-shelf cross-entropy or contrastive losses are incompetent to capture valuable local representations for MODE, due to the scale-discrepancy between the ID training and OOD detection processes. To mitigate this issue and encourage locally discriminative representations in ID training, we propose Attention-based Local PropAgation (ALPA), a trainable objective that exploits a cross-attention mechanism to align and highlight the local regions of the target objects for pairwise examples. During test-time OOD detection, a Cross-Scale Decision (CSD) function is further devised on the most discriminative multi-scale representations to distinguish ID/OOD data more faithfully. We demonstrate the effectiveness and flexibility of MODE on several benchmarks -- on average, MODE outperforms the previous state-of-the-art by up to 19.24% in FPR, 2.77% in AUROC. Code is available at https://github.com/JimZAI/MODE-OOD.
△ Less
Submitted 20 August, 2023;
originally announced August 2023.
-
Geometric instability of graph neural networks on large graphs
Authors:
Emily Morris,
Haotian Shen,
Weiling Du,
Muhammad Hamza Sajjad,
Borun Shi
Abstract:
We analyse the geometric instability of embeddings produced by graph neural networks (GNNs). Existing methods are only applicable for small graphs and lack context in the graph domain. We propose a simple, efficient and graph-native Graph Gram Index (GGI) to measure such instability which is invariant to permutation, orthogonal transformation, translation and order of evaluation. This allows us to…
▽ More
We analyse the geometric instability of embeddings produced by graph neural networks (GNNs). Existing methods are only applicable for small graphs and lack context in the graph domain. We propose a simple, efficient and graph-native Graph Gram Index (GGI) to measure such instability which is invariant to permutation, orthogonal transformation, translation and order of evaluation. This allows us to study the varying instability behaviour of GNN embeddings on large graphs for both node classification and link prediction.
△ Less
Submitted 28 November, 2023; v1 submitted 19 August, 2023;
originally announced August 2023.
-
DPL: Decoupled Prompt Learning for Vision-Language Models
Authors:
Chen Xu,
Yuhan Zhu,
Guozhen Zhang,
Haocheng Shen,
Yixuan Liao,
Xiaoxin Chen,
Gangshan Wu,
Limin Wang
Abstract:
Prompt learning has emerged as an efficient and effective approach for transferring foundational Vision-Language Models (e.g., CLIP) to downstream tasks. However, current methods tend to overfit to seen categories, thereby limiting their generalization ability for unseen classes. In this paper, we propose a new method, Decoupled Prompt Learning (DPL), which reformulates the attention in prompt lea…
▽ More
Prompt learning has emerged as an efficient and effective approach for transferring foundational Vision-Language Models (e.g., CLIP) to downstream tasks. However, current methods tend to overfit to seen categories, thereby limiting their generalization ability for unseen classes. In this paper, we propose a new method, Decoupled Prompt Learning (DPL), which reformulates the attention in prompt learning to alleviate this problem. Specifically, we theoretically investigate the collaborative process between prompts and instances (i.e., image patches/text tokens) by reformulating the original self-attention into four separate sub-processes. Through detailed analysis, we observe that certain sub-processes can be strengthened to bolster robustness and generalizability by some approximation techniques. Furthermore, we introduce language-conditioned textual prompting based on decoupled attention to naturally preserve the generalization of text input. Our approach is flexible for both visual and textual modalities, making it easily extendable to multi-modal prompt learning. By combining the proposed techniques, our approach achieves state-of-the-art performance on three representative benchmarks encompassing 15 image recognition datasets, while maintaining parameter-efficient. Moreover, our DPL does not rely on any auxiliary regularization task or extra training data, further demonstrating its remarkable generalization ability.
△ Less
Submitted 19 August, 2023;
originally announced August 2023.
-
Bridged-GNN: Knowledge Bridge Learning for Effective Knowledge Transfer
Authors:
Wendong Bi,
Xueqi Cheng,
Bingbing Xu,
Xiaoqian Sun,
Li Xu,
Huawei Shen
Abstract:
The data-hungry problem, characterized by insufficiency and low-quality of data, poses obstacles for deep learning models. Transfer learning has been a feasible way to transfer knowledge from high-quality external data of source domains to limited data of target domains, which follows a domain-level knowledge transfer to learn a shared posterior distribution. However, they are usually built on str…
▽ More
The data-hungry problem, characterized by insufficiency and low-quality of data, poses obstacles for deep learning models. Transfer learning has been a feasible way to transfer knowledge from high-quality external data of source domains to limited data of target domains, which follows a domain-level knowledge transfer to learn a shared posterior distribution. However, they are usually built on strong assumptions, e.g., the domain invariant posterior distribution, which is usually unsatisfied and may introduce noises, resulting in poor generalization ability on target domains. Inspired by Graph Neural Networks (GNNs) that aggregate information from neighboring nodes, we redefine the paradigm as learning a knowledge-enhanced posterior distribution for target domains, namely Knowledge Bridge Learning (KBL). KBL first learns the scope of knowledge transfer by constructing a Bridged-Graph that connects knowledgeable samples to each target sample and then performs sample-wise knowledge transfer via GNNs.KBL is free from strong assumptions and is robust to noises in the source data. Guided by KBL, we propose the Bridged-GNN} including an Adaptive Knowledge Retrieval module to build Bridged-Graph and a Graph Knowledge Transfer module. Comprehensive experiments on both un-relational and relational data-hungry scenarios demonstrate the significant improvements of Bridged-GNN compared with SOTA methods
△ Less
Submitted 18 August, 2023;
originally announced August 2023.
-
Study of $e^+e^-\toηφ$ at center-of-mass energies from 3.773 to 4.600 GeV
Authors:
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
J. Bloms,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann
, et al. (600 additional authors not shown)
Abstract:
We present a study of the process $e^{+}e^{-}\toηφ$ using data samples collected with the BESIII detector corresponding to an integrated luminosity of 15.03 fb$^{-1}$ at 23 center-of-mass energies from 3.773 to 4.600 GeV. The Born cross sections are measured at each energy and a coherent fit to cross-section lineshape is performed using a Breit-Wigner parametrization to search for charmonium-like…
▽ More
We present a study of the process $e^{+}e^{-}\toηφ$ using data samples collected with the BESIII detector corresponding to an integrated luminosity of 15.03 fb$^{-1}$ at 23 center-of-mass energies from 3.773 to 4.600 GeV. The Born cross sections are measured at each energy and a coherent fit to cross-section lineshape is performed using a Breit-Wigner parametrization to search for charmonium-like vector states. No significant signals of the $Y(4230)$ and $Y(4360)$ resonances are observed.
△ Less
Submitted 24 October, 2023; v1 submitted 16 August, 2023;
originally announced August 2023.
-
Somatomotor-Visual Resting State Functional Connectivity Increases After Two Years in the UK Biobank Longitudinal Cohort
Authors:
Anton Orlichenko,
Kuan-Jui Su,
Qing Tian,
Hui Shen,
Hong-Wen Deng,
Yu-** Wang
Abstract:
Purpose: Functional magnetic resonance imaging (fMRI) and functional connectivity (FC) have been used to follow aging in both children and older adults. Robust changes have been observed in children, where high connectivity among all brain regions changes to a more modular structure with maturation. In this work, we examine changes in FC in older adults after two years of aging in the UK Biobank l…
▽ More
Purpose: Functional magnetic resonance imaging (fMRI) and functional connectivity (FC) have been used to follow aging in both children and older adults. Robust changes have been observed in children, where high connectivity among all brain regions changes to a more modular structure with maturation. In this work, we examine changes in FC in older adults after two years of aging in the UK Biobank longitudinal cohort. Approach: We process data using the Power264 atlas, then test whether FC changes in the 2,722-subject longitudinal cohort are statistically significant using a Bonferroni-corrected t-test. We also compare the ability of Power264 and UKB-provided, ICA-based FC to determine which of a longitudinal scan pair is older. Results: We find a 6.8\% average increase in SMT-VIS connectivity from younger to older scan (from $ρ=0.39$ to $ρ=0.42$) that occurs in male, female, older subject ($>65$ years old), and younger subject ($<55$ years old) groups. Among all inter-network connections, this average SMT-VIS connectivity is the best predictor of relative scan age, accurately predicting which scan is older 57\% of the time. Using the full FC and a training set of 2,000 subjects, one is able to predict which scan is older 82.5\% of the time using either the full Power264 FC or the UKB-provided ICA-based FC. Conclusions: We conclude that SMT-VIS connectivity increases in the longitudinal cohort, while resting state FC increases generally with age in the cross-sectional cohort. However, we consider the possibility of a change in resting state scanner task between UKB longitudinal data acquisitions.
△ Less
Submitted 25 August, 2023; v1 submitted 15 August, 2023;
originally announced August 2023.
-
ImbSAM: A Closer Look at Sharpness-Aware Minimization in Class-Imbalanced Recognition
Authors:
Yixuan Zhou,
Yi Qu,
Xing Xu,
Hengtao Shen
Abstract:
Class imbalance is a common challenge in real-world recognition tasks, where the majority of classes have few samples, also known as tail classes. We address this challenge with the perspective of generalization and empirically find that the promising Sharpness-Aware Minimization (SAM) fails to address generalization issues under the class-imbalanced setting. Through investigating this specific ty…
▽ More
Class imbalance is a common challenge in real-world recognition tasks, where the majority of classes have few samples, also known as tail classes. We address this challenge with the perspective of generalization and empirically find that the promising Sharpness-Aware Minimization (SAM) fails to address generalization issues under the class-imbalanced setting. Through investigating this specific type of task, we identify that its generalization bottleneck primarily lies in the severe overfitting for tail classes with limited training data. To overcome this bottleneck, we leverage class priors to restrict the generalization scope of the class-agnostic SAM and propose a class-aware smoothness optimization algorithm named Imbalanced-SAM (ImbSAM). With the guidance of class priors, our ImbSAM specifically improves generalization targeting tail classes. We also verify the efficacy of ImbSAM on two prototypical applications of class-imbalanced recognition: long-tailed classification and semi-supervised anomaly detection, where our ImbSAM demonstrates remarkable performance improvements for tail classes and anomaly. Our code implementation is available at https://github.com/cool-xuan/Imbalanced_SAM.
△ Less
Submitted 15 August, 2023;
originally announced August 2023.