-
A Bayesian Approach to Online Learning for Contextual Restless Bandits with Applications to Public Health
Authors:
Biyonka Liang,
Lily Xu,
Aparna Taneja,
Milind Tambe,
Lucas Janson
Abstract:
Public health programs often provide interventions to encourage beneficiary adherence,and effectively allocating interventions is vital for producing the greatest overall health outcomes. Such resource allocation problems are often modeled as restless multi-armed bandits (RMABs) with unknown underlying transition dynamics, hence requiring online reinforcement learning (RL). We present Bayesian Lea…
▽ More
Public health programs often provide interventions to encourage beneficiary adherence,and effectively allocating interventions is vital for producing the greatest overall health outcomes. Such resource allocation problems are often modeled as restless multi-armed bandits (RMABs) with unknown underlying transition dynamics, hence requiring online reinforcement learning (RL). We present Bayesian Learning for Contextual RMABs (BCoR), an online RL approach for RMABs that novelly combines techniques in Bayesian modeling with Thompson sampling to flexibly model the complex RMAB settings present in public health program adherence problems, such as context and non-stationarity. BCoR's key strength is the ability to leverage shared information within and between arms to learn the unknown RMAB transition dynamics quickly in intervention-scarce settings with relatively short time horizons, which is common in public health applications. Empirically, BCoR achieves substantially higher finite-sample performance over a range of experimental settings, including an example based on real-world adherence data that was developed in collaboration with ARMMAN, an NGO in India which runs a large-scale maternal health program, showcasing BCoR practical utility and potential for real-world deployment.
△ Less
Submitted 27 May, 2024; v1 submitted 7 February, 2024;
originally announced February 2024.
-
Nonlinear Stability of Planar Shock Waves for the 3-D Boltzmann Equation
Authors:
Dingqun Deng,
Lingda Xu
Abstract:
This paper studies the stability and large-time behavior of the three-dimensional (3-D) Boltzmann equation near shock profiles. We prove the nonlinear stability of the composite wave consisting of two shock profiles under general perturbations without the assumption of integral zero of macroscopic quantities. To address the challenge caused by the compressibility of shock profiles, we apply the me…
▽ More
This paper studies the stability and large-time behavior of the three-dimensional (3-D) Boltzmann equation near shock profiles. We prove the nonlinear stability of the composite wave consisting of two shock profiles under general perturbations without the assumption of integral zero of macroscopic quantities. To address the challenge caused by the compressibility of shock profiles, we apply the method of anti-derivative based on macro-micro decomposition. However, the system of anti-derivatives presents certain difficulties. Firstly, general perturbations may generate diffusion waves that evolve and interact with shock profiles, resulting in errors that are not controllable. We therefore introduce a set of coupled diffusion waves to cancel out these poor errors and perform careful estimates on wave interactions. Secondly, we perform diagonalized system estimates to fully exploit the compressibility of shock profiles and control terms that decay slowly. Thirdly, the presence of diffusion waves causes critical terms with decay $(1+t)^{-1}$, and we introduce a Poincaré type of inequality to address these terms. Finally, estimates on anti-derivatives can only control terms along the propagation direction, while for transversal directions, we use the entropy-entropy flux pair as well as the Poincaré inequality to control the lower order terms using diffusion terms. As a result, we obtain nonlinear stability through the energy method, which is the first stability result for the planar shock of the multi-dimensional Boltzmann equation to the best of our knowledge.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
IMUSE: IMU-based Facial Expression Capture
Authors:
Youjia Wang,
Yiwen Wu,
Hengan Zhou,
Hongyang Lin,
Xingyue Peng,
Yingwenqi Jiang,
Yingsheng Zhu,
Guanpeng Long,
Yatu Zhang,
**gya Wang,
Lan Xu,
**gyi Yu
Abstract:
For facial motion capture and analysis, the dominated solutions are generally based on visual cues, which cannot protect privacy and are vulnerable to occlusions. Inertial measurement units (IMUs) serve as potential rescues yet are mainly adopted for full-body motion capture. In this paper, we propose IMUSE to fill the gap, a novel path for facial expression capture using purely IMU signals, signi…
▽ More
For facial motion capture and analysis, the dominated solutions are generally based on visual cues, which cannot protect privacy and are vulnerable to occlusions. Inertial measurement units (IMUs) serve as potential rescues yet are mainly adopted for full-body motion capture. In this paper, we propose IMUSE to fill the gap, a novel path for facial expression capture using purely IMU signals, significantly distant from previous visual solutions.The key design in our IMUSE is a trilogy. We first design micro-IMUs to suit facial capture, companion with an anatomy-driven IMU placement scheme. Then, we contribute a novel IMU-ARKit dataset, which provides rich paired IMU/visual signals for diverse facial expressions and performances. Such unique multi-modality brings huge potential for future directions like IMU-based facial behavior analysis. Moreover, utilizing IMU-ARKit, we introduce a strong baseline approach to accurately predict facial blendshape parameters from purely IMU signals. The IMUSE framework empowers us to perform accurate facial capture in scenarios where visual methods falter and simultaneously safeguard user privacy. We conduct extensive experiments about both the IMU configuration and technical components to validate the effectiveness of our IMUSE approach. Notably, IMUSE enables various potential and novel applications, i.e., facial capture against occlusions or in a moving performance. We will release our dataset and implementations to enrich more possibilities of facial capture and analysis in our community.
△ Less
Submitted 12 June, 2024; v1 submitted 3 February, 2024;
originally announced February 2024.
-
Precise Measurement of Born Cross Sections for $e^+e^-\to D\bar{D}$ and Observation of One Structure between $\sqrt{s} = 3.80-4.95$ GeV
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann
, et al. (604 additional authors not shown)
Abstract:
Using data samples collected with the BESIII detector at the BEPCII collider at center-of-mass energies ranging from 3.80 to 4.95 GeV, corresponding to an integrated luminosity of 20 fb$^{-1}$, a measurement of Born cross sections for the $e^+e^-\to D^{0}\bar{D}^{0}$ and $D^{+}D^{-}$ processes is presented with unprecedented precision. By performing a simultaneous fit to the dressed cross sections…
▽ More
Using data samples collected with the BESIII detector at the BEPCII collider at center-of-mass energies ranging from 3.80 to 4.95 GeV, corresponding to an integrated luminosity of 20 fb$^{-1}$, a measurement of Born cross sections for the $e^+e^-\to D^{0}\bar{D}^{0}$ and $D^{+}D^{-}$ processes is presented with unprecedented precision. By performing a simultaneous fit to the dressed cross sections for both processes, one possible new structure around 3.9 GeV/$c^2$ is observed for the first time, in addition to seven known resonances $ψ(3770)$, $ψ(4040)$, $ψ(4160)$, $Y(4230)$, $Y(4360)$, $ψ(4415)$, and $Y(4660)$. These results offer crucial experimental insights into the nature of hadron production in the open charm region.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
Revisiting Generative Adversarial Networks for Binary Semantic Segmentation on Imbalanced Datasets
Authors:
Lei Xu,
Moncef Gabbouj
Abstract:
Anomalous crack region detection is a typical binary semantic segmentation task, which aims to detect pixels representing cracks on pavement surface images automatically by algorithms. Although existing deep learning-based methods have achieved outcoming results on specific public pavement datasets, the performance would deteriorate dramatically on imbalanced datasets. The input datasets used in s…
▽ More
Anomalous crack region detection is a typical binary semantic segmentation task, which aims to detect pixels representing cracks on pavement surface images automatically by algorithms. Although existing deep learning-based methods have achieved outcoming results on specific public pavement datasets, the performance would deteriorate dramatically on imbalanced datasets. The input datasets used in such tasks suffer from severely between-class imbalanced problems, hence, it is a core challenge to obtain a robust performance on diverse pavement datasets with generic deep learning models. To address this problem, in this work, we propose a deep learning framework based on conditional Generative Adversarial Networks (cGANs) for the anomalous crack region detection tasks at the pixel level. In particular, the proposed framework containing a cGANs and a novel auxiliary network is developed to enhance and stabilize the generator's performance under two alternative training stages, when estimating a multiscale probability feature map from heterogeneous and imbalanced inputs iteratively. Moreover, several attention mechanisms and entropy strategies are incorporated into the cGANs architecture and the auxiliary network separately to mitigate further the performance deterioration of model training on severely imbalanced datasets. We implement extensive experiments on six accessible pavement datasets. The experimental results from both visual and quantitative evaluation show that the proposed framework can achieve state-of-the-art results on these datasets efficiently and robustly without acceleration of computation complexity.
△ Less
Submitted 7 March, 2024; v1 submitted 3 February, 2024;
originally announced February 2024.
-
Emergency Computing: An Adaptive Collaborative Inference Method Based on Hierarchical Reinforcement Learning
Authors:
Weiqi Fu,
Lianming Xu,
Xin Wu,
Li Wang,
Aiguo Fei
Abstract:
In achieving effective emergency response, the timely acquisition of environmental information, seamless command data transmission, and prompt decision-making are crucial. This necessitates the establishment of a resilient emergency communication dedicated network, capable of providing communication and sensing services even in the absence of basic infrastructure. In this paper, we propose an Emer…
▽ More
In achieving effective emergency response, the timely acquisition of environmental information, seamless command data transmission, and prompt decision-making are crucial. This necessitates the establishment of a resilient emergency communication dedicated network, capable of providing communication and sensing services even in the absence of basic infrastructure. In this paper, we propose an Emergency Network with Sensing, Communication, Computation, Caching, and Intelligence (E-SC3I). The framework incorporates mechanisms for emergency computing, caching, integrated communication and sensing, and intelligence empowerment. E-SC3I ensures rapid access to a large user base, reliable data transmission over unstable links, and dynamic network deployment in a changing environment. However, these advantages come at the cost of significant computation overhead. Therefore, we specifically concentrate on emergency computing and propose an adaptive collaborative inference method (ACIM) based on hierarchical reinforcement learning. Experimental results demonstrate our method's ability to achieve rapid inference of AI models with constrained computational and communication resources.
△ Less
Submitted 3 February, 2024;
originally announced February 2024.
-
Measurement of the Electromagnetic Transition Form-factors in the decays $η'\rightarrowπ^+π^-l^+l^-$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (618 additional authors not shown)
Abstract:
With a sample of $(10087\pm44)\times10^{6}$ $J/ψ$ events accumulated with the BESIII detector, we analyze the decays $η'\rightarrowπ^+π^-l^+l^-(l=e,$ $μ)$ via the process $J/ψ\rightarrowγη'$. The branching fractions are measured to be $\mathcal{B}(η'\rightarrowπ^+π^-e^+e^-)=(2.45\pm0.02(\rm{stat.})\pm0.08(\rm{syst.})) \times10^{-3}$ and…
▽ More
With a sample of $(10087\pm44)\times10^{6}$ $J/ψ$ events accumulated with the BESIII detector, we analyze the decays $η'\rightarrowπ^+π^-l^+l^-(l=e,$ $μ)$ via the process $J/ψ\rightarrowγη'$. The branching fractions are measured to be $\mathcal{B}(η'\rightarrowπ^+π^-e^+e^-)=(2.45\pm0.02(\rm{stat.})\pm0.08(\rm{syst.})) \times10^{-3}$ and $\mathcal{B}(η'\rightarrowπ^+π^-μ^+μ^-)=(2.16\pm0.12(\rm{stat.})\pm0.06(\rm{syst.}))\times10^{-5}$, and the ratio is $\frac{\mathcal{B}(η'\rightarrowπ^{+}π^{-}e^{+}e^{-})}{\mathcal{B}(η'\rightarrowπ^{+}π^{-}μ^{+}μ^{-})} = 113.4\pm0.9(\rm{stat.})\pm3.7(\rm{syst.})$. In addition, by combining the $η'\rightarrowπ^+π^-e^+e^-$ and $η'\rightarrowπ^+π^-μ^+μ^-$ decays, the slope parameter of the electromagnetic transition form factor is measured to be $b_{η'}=1.30\pm0.19\ (\mathrm{GeV}/c^{2})^{-2}$, which is consistent with previous measurements from BESIII and theoretical predictions from the VMD model. The asymmetry in the angle between the $π^+π^-$ and $l^+l^-$ decay planes, which has the potential to reveal the $CP$-violation originating from an unconventional electric dipole transition, is also investigated. The asymmetry parameters are determined to be $\mathcal{A}_{CP}(η'\rightarrowπ^+π^-e^+e^-)=(-0.21\pm0.73(\rm{stat.})\pm0.01(\rm{syst.}))\%$ and $\mathcal{A}_{CP}(η'\rightarrowπ^+π^-μ^+μ^-)=(0.62\pm4.71(\rm{stat.})\pm0.08(\rm{syst.}))\%$, implying that no evidence of $CP$-violation is observed at the present statistics. Finally, an axion-like particle is searched for via the decay $η'\rightarrowπ^+π^-a, a\rightarrow e^+e^-$, and upper limits of the branching fractions are presented for the mass assumptions of the axion-like particle in the range of $0-500\ \mathrm{MeV}/c^{2}$.
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
Measurements of the branching fraction ratio $\cal{B}(φ\to μ^+μ^-)/\cal{B}(φ\to e^+e^-)$ with charm meson decays
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
A. Alfonso Albero,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1080 additional authors not shown)
Abstract:
Measurements of the branching fraction ratio ${\cal{B}(φ\to μ^+ μ^-)/\cal{B}(φ\to e^+e^-)}$ with ${D_{s}^{+} \to π^{+} φ}$ and ${D^{+} \to π^{+} φ}$ decays, denoted $R^{s}_{φπ}$ and $R^{d}_{φπ}$, are presented. The analysis is performed using a dataset corresponding to an integrated luminosity of 5.4$\,\rm{fb}^{-1}$ of $pp$ collision data collected with the LHCb experiment. The branching fractions…
▽ More
Measurements of the branching fraction ratio ${\cal{B}(φ\to μ^+ μ^-)/\cal{B}(φ\to e^+e^-)}$ with ${D_{s}^{+} \to π^{+} φ}$ and ${D^{+} \to π^{+} φ}$ decays, denoted $R^{s}_{φπ}$ and $R^{d}_{φπ}$, are presented. The analysis is performed using a dataset corresponding to an integrated luminosity of 5.4$\,\rm{fb}^{-1}$ of $pp$ collision data collected with the LHCb experiment. The branching fractions are normalised with respect to the ${B^{+} \to K^{+} J/ψ(\to e^+e^-)}$ and ${B^{+} \to K^{+} J/ψ(\to μ^+μ^-)}$ decay modes. The combination of the results yields $$ R_{φπ} = 1.022 \pm 0.012 \,({\rm stat}) \, \pm 0.048 \,({\rm syst}). $$ The result is compatible with previous measurements of the $φ\to \ell^{+}\ell^{-}$ branching fractions and predictions based on the Standard Model.
△ Less
Submitted 1 May, 2024; v1 submitted 2 February, 2024;
originally announced February 2024.
-
An Intra-BRNN and GB-RVQ Based END-TO-END Neural Audio Codec
Authors:
Lin** Xu,
Jiawei Jiang,
Dejun Zhang,
Xianjun Xia,
Li Chen,
Yijian Xiao,
Piao Ding,
Shenyi Song,
Sixing Yin,
Ferdous Sohel
Abstract:
Recently, neural networks have proven to be effective in performing speech coding task at low bitrates. However, under-utilization of intra-frame correlations and the error of quantizer specifically degrade the reconstructed audio quality. To improve the coding quality, we present an end-to-end neural speech codec, namely CBRC (Convolutional and Bidirectional Recurrent neural Codec). An interleave…
▽ More
Recently, neural networks have proven to be effective in performing speech coding task at low bitrates. However, under-utilization of intra-frame correlations and the error of quantizer specifically degrade the reconstructed audio quality. To improve the coding quality, we present an end-to-end neural speech codec, namely CBRC (Convolutional and Bidirectional Recurrent neural Codec). An interleaved structure using 1D-CNN and Intra-BRNN is designed to exploit the intra-frame correlations more efficiently. Furthermore, Group-wise and Beam-search Residual Vector Quantizer (GB-RVQ) is used to reduce the quantization noise. CBRC encodes audio every 20ms with no additional latency, which is suitable for real-time communication. Experimental results demonstrate the superiority of the proposed codec when comparing CBRC at 3kbps with Opus at 12kbps.
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
Dual-Tap Optical-Digital Feedforward Equalization Enabling High-Speed Optical Transmission in IM/DD Systems
Authors:
Yu Guo,
Yangbo Wu,
Zhao Yang,
Lei Xue,
Ning Liang,
Yang Ren,
Zhengrui Tu,
Jia Feng,
Qunbi Zhuge
Abstract:
Intensity-modulation and direct-detection (IM/DD) transmission is widely adopted for high-speed optical transmission scenarios due to its cost-effectiveness and simplicity. However, as the data rate increases, the fiber chromatic dispersion (CD) would induce a serious power fading effect, and direct detection could generate inter-symbol interference (ISI). Moreover, the ISI becomes more severe wit…
▽ More
Intensity-modulation and direct-detection (IM/DD) transmission is widely adopted for high-speed optical transmission scenarios due to its cost-effectiveness and simplicity. However, as the data rate increases, the fiber chromatic dispersion (CD) would induce a serious power fading effect, and direct detection could generate inter-symbol interference (ISI). Moreover, the ISI becomes more severe with the increase of fiber length, thereby highly restricting the transmission distance of IM/DD systems. This paper proposes a dual-tap optical-digital feedforward equalization (DT-ODFE) scheme, which could effectively compensate for CD-induced power fading while maintaining low cost and simplicity. A theoretical channel response is formulated for IM/DD transmission, incorporating a dual-tap optical equalizer, and the theoretical analysis reveals that for an IM/DD transmission using 1371nm over 10km standard single-mode fiber (SSMF), frequency notch is removed from 33.7GHz to 46GHz. Simulation results show that the DT- ODFE achieves an SNR gain of 2.3dB over IM/DD systems with symbol-space feedforward equalizer (FFE) alone. As the fiber length increases to 15 km, DT- ODFE performs well, while FFE, decision-feedback equalizer (DFE) and Volterra nonlinear equalizers (VNLE) all fail to compensate for the power fading and the 7% hard-decision FEC limit is not satisfied. For 200 Gb/s/$λ$ PAM-4 over 15km SSMF, results show that the signal-to-noise ratio (SNR) of the proposed DT- ODFE with optimal coefficients satisfies the 7% hard-decision FEC limit, which uncovers the great potential of the DT- ODFE for high-speed IM/DD systems in LR/FR scenarios.
△ Less
Submitted 1 February, 2024; v1 submitted 1 February, 2024;
originally announced February 2024.
-
Study of $CP$ violation in $B^0_{(s)} \to D K^{*}(892)^0$ decays with $D \to K π( ππ)$, $ ππ( ππ)$, and $KK$ final states
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
A. Alfonso Albero,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1072 additional authors not shown)
Abstract:
A measurement of $CP$-violating observables associated with the interference of $B^0\to D^0 K^{*}(892)^0$ and $B^0\to \bar{D}^0 K^*(892)^0$ decay amplitudes is performed in the $D^0 \to K^{\mp}π^{\pm}(π^+π^-),$ $D^0 \to π^+π^-(π^+π^-)$, and $D^0\to K^+K^-$ final states using data collected by the LHCb experiment corresponding to an integrated luminosity of $9$ $\text{fb}^{-1}$. $CP$-violating obse…
▽ More
A measurement of $CP$-violating observables associated with the interference of $B^0\to D^0 K^{*}(892)^0$ and $B^0\to \bar{D}^0 K^*(892)^0$ decay amplitudes is performed in the $D^0 \to K^{\mp}π^{\pm}(π^+π^-),$ $D^0 \to π^+π^-(π^+π^-)$, and $D^0\to K^+K^-$ final states using data collected by the LHCb experiment corresponding to an integrated luminosity of $9$ $\text{fb}^{-1}$. $CP$-violating observables related to the interference of $B^0_s\to D^0 \bar{K}^*(892)^0$ and $B_s^0\to \bar{D}^0 \bar{K}^*(892)^0$ are also measured, but no evidence for interference is found. The $B^0$ observables are used to constrain the parameter space of the CKM angle $γ$ and the hadronic parameters $r_{B^0}^{DK^*}$ and $δ_{B^0}^{DK^*}$ with inputs from other measurements. In a combined analysis, these measurements allow for four solutions in the parameter space, only one of which is consistent with the world average.
△ Less
Submitted 13 May, 2024; v1 submitted 31 January, 2024;
originally announced January 2024.
-
Photosynthetic properties assisted by the quantum entanglement in two adjacent pigment molecules
Authors:
Lu-Xin Xu,
Shun-Cai Zhao,
Ling-Fang Li
Abstract:
The quantum dynamics of entanglement is widely revealed in photosynthetic light-harvesting complexes. Different from the previous work, we explore the properties of exciton transport and photosynthesis assisted by the quantum entanglement in two adjacent pigment molecules, which are measured by the population dynamics behaviors, the $j$-$V$ characteristics and by the output power via a photosynthe…
▽ More
The quantum dynamics of entanglement is widely revealed in photosynthetic light-harvesting complexes. Different from the previous work, we explore the properties of exciton transport and photosynthesis assisted by the quantum entanglement in two adjacent pigment molecules, which are measured by the population dynamics behaviors, the $j$-$V$ characteristics and by the output power via a photosynthetic quantum heat engine (QHE) model. A more robust exciton transport dynamic behavior is compared with those without quantum entanglement, and the photosynthetic characteristics evaluated by the output current and power were proved to be enhanced by the quantum entanglement at different ambient temperatures. These results may point toward the possibility for artificial photosynthetic nanostructures inspired by this quantum biological systems.
△ Less
Submitted 31 January, 2024;
originally announced January 2024.
-
Differentiation of correlated fluctuations in site energy on excitation energy transfer in photosynthetic light-harvesting complexes
Authors:
Lu-Xin Xu,
Shun-Cai Zhao,
Sheng-Nan Zhu,
Lin-Jie Chen
Abstract:
One of the promising approaches to revealing the photosynthetic efficiency of close to one unit is to investigate the quantum regime of excitation energy transfer (EET). The majority of studies, however, have concluded that different pigment molecules contribute equally to EET, rather than differently. We investigate the roles of different site-energies in EET by evaluating the correlated fluctuat…
▽ More
One of the promising approaches to revealing the photosynthetic efficiency of close to one unit is to investigate the quantum regime of excitation energy transfer (EET). The majority of studies, however, have concluded that different pigment molecules contribute equally to EET, rather than differently. We investigate the roles of different site-energies in EET by evaluating the correlated fluctuations of site-energies in two adjacent pigment molecules (namely Site 1 and Site 2), and we attempt to demonstrate different site roles in EET with the j-V characteristics and power via a photosynthetic quantum heat engine (QHE) model rather than an actual photosynthetic protein. The results show that fluctuations at Site 1 (the pigment molecule absorbing solar photons) provide ascending and then descending EET. At Site 2, the EET is reduced through the use of correlated fluctuation increments (the pigment molecule acting as the charge-transfer excited state). Furthermore, when investigating the correlated fluctuations at Site 2, the different gap differences of the output terminal play a positive role in EET, but a sharply decreasing EET process is also achieved with less correlated fluctuations at Site 2 compared to those at Site 1.The findings show that different pigment molecules contribute differently to EET. The significance of this work is that it not only clarifies the roles of different pigment molecules in EET, but it also deepens our understanding of the fundamental physics of EET as it transports through the molecular chain in photosynthetic light-harvesting complexes. Furthermore, the results are appropriate to the EET in organic semiconductors, photovoltaic devices, and quantum networks, when these systems couple to the environment of photons via the vibrational motion of sites in the molecular chain.
△ Less
Submitted 31 January, 2024;
originally announced January 2024.
-
Measurements of Normalized Differential Cross Sections of Inclusive $η$ Production in $e^{+}e^{-}$ Annihilation at Energy from 2.0000 to 3.6710 GeV
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
D. Anderle,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko
, et al. (641 additional authors not shown)
Abstract:
Using data samples collected with the BESIII detector operating at the BEPCII storage ring, the cross section of the inclusive process $e^{+}e^{-} \to η+ X$, normalized by the total cross section of $e^{+}e^{-} \to \text{hadrons}$, is measured at eight center-of-mass energy points from 2.0000 GeV to 3.6710 GeV. These are the first measurements with momentum dependence in this energy region. Our me…
▽ More
Using data samples collected with the BESIII detector operating at the BEPCII storage ring, the cross section of the inclusive process $e^{+}e^{-} \to η+ X$, normalized by the total cross section of $e^{+}e^{-} \to \text{hadrons}$, is measured at eight center-of-mass energy points from 2.0000 GeV to 3.6710 GeV. These are the first measurements with momentum dependence in this energy region. Our measurement shows a significant discrepancy from calculations with the existing fragmentation functions. To address this discrepancy, a new QCD analysis is performed at the next-to-next-to-leading order with hadron mass corrections and higher twist effects, which can explain both the established high-energy data and our measurements reasonably well.
△ Less
Submitted 31 January, 2024;
originally announced January 2024.
-
Delayed response to the photovoltaic performance in a double quantum dot photocell with spatially correlated fluctuation
Authors:
Sheng-Nan Zhu,
Shun-Cai Zhao,
Lu-Xin Xu,
Lin-Jie Chen
Abstract:
A viable strategy for enhancing photovoltaic performance in a double quantum dot (DQD) photocell is to comprehend the underlying quantum physical regime of charge transfer. This work explores the photovoltaic performance dependent spatially correlated fluctuation in a DQD photocell. A suggested DQD photocell model was used to examine the effects of spatially correlated variation on charge transfer…
▽ More
A viable strategy for enhancing photovoltaic performance in a double quantum dot (DQD) photocell is to comprehend the underlying quantum physical regime of charge transfer. This work explores the photovoltaic performance dependent spatially correlated fluctuation in a DQD photocell. A suggested DQD photocell model was used to examine the effects of spatially correlated variation on charge transfer and output photovoltaic efficiency. The charge transfer process and the process of reaching peak solar efficiency were both significantly delayed as a result of the spatially correlated fluctuation, and the anti-spatial correlation fluctuation also resulted in lower output photovoltaic efficiency. Further results revealed that some structural parameters, such as gap difference and tunneling coefficient within two dots, could suppress the delayed response, and a natural adjustment feature was demonstrated on the delayed response in this DQD photocell model. Subsequent investigation verified that the delayed response was caused by the spatial correlation fluctuation, which slowed the generative process of noise-induced coherence, which had previously been proven to improve quantum photovoltaic performance in quantum photocells. While anti-spatial correlation fluctuation and a hotter thermal ambient environment could diminish the condition for noise-induced coherence, as demonstrated by the reduced photovoltaic capabilities in this suggested DQD photocell model. As a result, we expect that regulated noise-induced coherence, via spatially correlated fluctuation, will have a major impact on photovoltaic qualities in a DQD photocell system. The discovery of its underlying physical regime of quantum fluctuation will broaden and deepen understanding of quantum features of electron transfer, as well as provide some indications concerning quantum techniques for high efficiency DQD solar cells.
△ Less
Submitted 31 January, 2024;
originally announced January 2024.
-
Good and Fast Row-Sparse ah-Symmetric Reflexive Generalized Inverses
Authors:
Gabriel Ponte,
Marcia Fampa,
Jon Lee,
Luze Xu
Abstract:
We present several algorithms aimed at constructing sparse and structured sparse (row-sparse) generalized inverses, with application to the efficient computation of least-squares solutions, for inconsistent systems of linear equations, in the setting of multiple right-hand sides and a rank-deficient constraint matrix. Leveraging our earlier formulations to minimize the 1- and 2,1- norms of general…
▽ More
We present several algorithms aimed at constructing sparse and structured sparse (row-sparse) generalized inverses, with application to the efficient computation of least-squares solutions, for inconsistent systems of linear equations, in the setting of multiple right-hand sides and a rank-deficient constraint matrix. Leveraging our earlier formulations to minimize the 1- and 2,1- norms of generalized inverses that satisfy important properties of the Moore-Penrose pseudoinverse, we develop efficient and scalable ADMM algorithms to address these norm-minimization problems and to limit the number of nonzero rows in the solution. We establish a 2,1-norm approximation result for a local-search procedure that was originally designed for 1-norm minimization, and we compare the ADMM algorithms with the local-search procedure and with general-purpose optimization solvers.
△ Less
Submitted 25 June, 2024; v1 submitted 30 January, 2024;
originally announced January 2024.
-
Single Word Change is All You Need: Designing Attacks and Defenses for Text Classifiers
Authors:
Lei Xu,
Sarah Alnegheimish,
Laure Berti-Equille,
Alfredo Cuesta-Infante,
Kalyan Veeramachaneni
Abstract:
In text classification, creating an adversarial example means subtly perturbing a few words in a sentence without changing its meaning, causing it to be misclassified by a classifier. A concerning observation is that a significant portion of adversarial examples generated by existing methods change only one word. This single-word perturbation vulnerability represents a significant weakness in clas…
▽ More
In text classification, creating an adversarial example means subtly perturbing a few words in a sentence without changing its meaning, causing it to be misclassified by a classifier. A concerning observation is that a significant portion of adversarial examples generated by existing methods change only one word. This single-word perturbation vulnerability represents a significant weakness in classifiers, which malicious users can exploit to efficiently create a multitude of adversarial examples. This paper studies this problem and makes the following key contributions: (1) We introduce a novel metric \r{ho} to quantitatively assess a classifier's robustness against single-word perturbation. (2) We present the SP-Attack, designed to exploit the single-word perturbation vulnerability, achieving a higher attack success rate, better preserving sentence meaning, while reducing computation costs compared to state-of-the-art adversarial methods. (3) We propose SP-Defense, which aims to improve \r{ho} by applying data augmentation in learning. Experimental results on 4 datasets and BERT and distilBERT classifiers show that SP-Defense improves \r{ho} by 14.6% and 13.9% and decreases the attack success rate of SP-Attack by 30.4% and 21.2% on two classifiers respectively, and decreases the attack success rate of existing attack methods that involve multiple-word perturbations.
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
Revisiting Gradient Pruning: A Dual Realization for Defending against Gradient Attacks
Authors:
Lulu Xue,
Shengshan Hu,
Ruizhi Zhao,
Leo Yu Zhang,
Shengqing Hu,
Lichao Sun,
Dezhong Yao
Abstract:
Collaborative learning (CL) is a distributed learning framework that aims to protect user privacy by allowing users to jointly train a model by sharing their gradient updates only. However, gradient inversion attacks (GIAs), which recover users' training data from shared gradients, impose severe privacy threats to CL. Existing defense methods adopt different techniques, e.g., differential privacy,…
▽ More
Collaborative learning (CL) is a distributed learning framework that aims to protect user privacy by allowing users to jointly train a model by sharing their gradient updates only. However, gradient inversion attacks (GIAs), which recover users' training data from shared gradients, impose severe privacy threats to CL. Existing defense methods adopt different techniques, e.g., differential privacy, cryptography, and perturbation defenses, to defend against the GIAs. Nevertheless, all current defense methods suffer from a poor trade-off between privacy, utility, and efficiency. To mitigate the weaknesses of existing solutions, we propose a novel defense method, Dual Gradient Pruning (DGP), based on gradient pruning, which can improve communication efficiency while preserving the utility and privacy of CL. Specifically, DGP slightly changes gradient pruning with a stronger privacy guarantee. And DGP can also significantly improve communication efficiency with a theoretical analysis of its convergence and generalization. Our extensive experiments show that DGP can effectively defend against the most powerful GIAs and reduce the communication cost without sacrificing the model's utility.
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
Data and Physics driven Deep Learning Models for Fast MRI Reconstruction: Fundamentals and Methodologies
Authors:
Jiahao Huang,
Yinzhe Wu,
Fanwen Wang,
Yingying Fang,
Yang Nan,
Cagan Alkan,
Lei Xu,
Zhifan Gao,
Weiwen Wu,
Lei Zhu,
Zhaolin Chen,
Peter Lally,
Neal Bangerter,
Kawin Setsompop,
Yike Guo,
Daniel Rueckert,
Ge Wang,
Guang Yang
Abstract:
Magnetic Resonance Imaging (MRI) is a pivotal clinical diagnostic tool, yet its extended scanning times often compromise patient comfort and image quality, especially in volumetric, temporal and quantitative scans. This review elucidates recent advances in MRI acceleration via data and physics-driven models, leveraging techniques from algorithm unrolling models, enhancement-based models, and plug-…
▽ More
Magnetic Resonance Imaging (MRI) is a pivotal clinical diagnostic tool, yet its extended scanning times often compromise patient comfort and image quality, especially in volumetric, temporal and quantitative scans. This review elucidates recent advances in MRI acceleration via data and physics-driven models, leveraging techniques from algorithm unrolling models, enhancement-based models, and plug-and-play models to emergent full spectrum of generative models. We also explore the synergistic integration of data models with physics-based insights, encompassing the advancements in multi-coil hardware accelerations like parallel imaging and simultaneous multi-slice imaging, and the optimization of sampling patterns. We then focus on domain-specific challenges and opportunities, including image redundancy exploitation, image integrity, evaluation metrics, data heterogeneity, and model generalization. This work also discusses potential solutions and future research directions, emphasizing the role of data harmonization, and federated learning for further improving the general applicability and performance of these methods in MRI reconstruction.
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
Dropout Concrete Autoencoder for Band Selection on HSI Scenes
Authors:
Lei Xu,
Mete Ahishali,
Moncef Gabbouj
Abstract:
Deep learning-based informative band selection methods on hyperspectral images (HSI) recently have gained intense attention to eliminate spectral correlation and redundancies. However, the existing deep learning-based methods either need additional post-processing strategies to select the descriptive bands or optimize the model indirectly, due to the parameterization inability of discrete variable…
▽ More
Deep learning-based informative band selection methods on hyperspectral images (HSI) recently have gained intense attention to eliminate spectral correlation and redundancies. However, the existing deep learning-based methods either need additional post-processing strategies to select the descriptive bands or optimize the model indirectly, due to the parameterization inability of discrete variables for the selection procedure. To overcome these limitations, this work proposes a novel end-to-end network for informative band selection. The proposed network is inspired by the advances in concrete autoencoder (CAE) and dropout feature ranking strategy. Different from the traditional deep learning-based methods, the proposed network is trained directly given the required band subset eliminating the need for further post-processing. Experimental results on four HSI scenes show that the proposed dropout CAE achieves substantial and effective performance levels outperforming the competing methods.
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
DressCode: Autoregressively Sewing and Generating Garments from Text Guidance
Authors:
Kai He,
Kaixin Yao,
Qixuan Zhang,
**gyi Yu,
Lingjie Liu,
Lan Xu
Abstract:
Apparel's significant role in human appearance underscores the importance of garment digitalization for digital human creation. Recent advances in 3D content creation are pivotal for digital human creation. Nonetheless, garment generation from text guidance is still nascent. We introduce a text-driven 3D garment generation framework, DressCode, which aims to democratize design for novices and offe…
▽ More
Apparel's significant role in human appearance underscores the importance of garment digitalization for digital human creation. Recent advances in 3D content creation are pivotal for digital human creation. Nonetheless, garment generation from text guidance is still nascent. We introduce a text-driven 3D garment generation framework, DressCode, which aims to democratize design for novices and offer immense potential in fashion design, virtual try-on, and digital human creation. We first introduce SewingGPT, a GPT-based architecture integrating cross-attention with text-conditioned embedding to generate sewing patterns with text guidance. We then tailor a pre-trained Stable Diffusion to generate tile-based Physically-based Rendering (PBR) textures for the garments. By leveraging a large language model, our framework generates CG-friendly garments through natural language interaction. It also facilitates pattern completion and texture editing, streamlining the design process through user-friendly interaction. This framework fosters innovation by allowing creators to freely experiment with designs and incorporate unique elements into their work. With comprehensive evaluations and comparisons with other state-of-the-art methods, our method showcases superior quality and alignment with input prompts. User studies further validate our high-quality rendering results, highlighting its practical utility and potential in production settings. Our project page is https://IHe-KaiI.github.io/DressCode/.
△ Less
Submitted 14 June, 2024; v1 submitted 29 January, 2024;
originally announced January 2024.
-
Media2Face: Co-speech Facial Animation Generation With Multi-Modality Guidance
Authors:
Qingcheng Zhao,
Pengyu Long,
Qixuan Zhang,
Dafei Qin,
Han Liang,
Longwen Zhang,
Yingliang Zhang,
**gyi Yu,
Lan Xu
Abstract:
The synthesis of 3D facial animations from speech has garnered considerable attention. Due to the scarcity of high-quality 4D facial data and well-annotated abundant multi-modality labels, previous methods often suffer from limited realism and a lack of lexible conditioning. We address this challenge through a trilogy. We first introduce Generalized Neural Parametric Facial Asset (GNPFA), an effic…
▽ More
The synthesis of 3D facial animations from speech has garnered considerable attention. Due to the scarcity of high-quality 4D facial data and well-annotated abundant multi-modality labels, previous methods often suffer from limited realism and a lack of lexible conditioning. We address this challenge through a trilogy. We first introduce Generalized Neural Parametric Facial Asset (GNPFA), an efficient variational auto-encoder map** facial geometry and images to a highly generalized expression latent space, decoupling expressions and identities. Then, we utilize GNPFA to extract high-quality expressions and accurate head poses from a large array of videos. This presents the M2F-D dataset, a large, diverse, and scan-level co-speech 3D facial animation dataset with well-annotated emotional and style labels. Finally, we propose Media2Face, a diffusion model in GNPFA latent space for co-speech facial animation generation, accepting rich multi-modality guidances from audio, text, and image. Extensive experiments demonstrate that our model not only achieves high fidelity in facial animation synthesis but also broadens the scope of expressiveness and style adaptability in 3D facial animation.
△ Less
Submitted 30 January, 2024; v1 submitted 28 January, 2024;
originally announced January 2024.
-
Observation of topological frequency combs
Authors:
Christopher J. Flower,
Mahmoud Jalali Mehrabad,
Lida Xu,
Gregory Moille,
Daniel G. Suarez-Forero,
Ogulcan Orsel,
Gaurav Bahl,
Yanne Chembo,
Kartik Srinivasan,
Sunil Mittal,
Mohammad Hafezi
Abstract:
On-chip generation of optical frequency combs using nonlinear ring resonators has opened the route to numerous novel applications of combs that were otherwise limited to mode-locked laser systems. Nevertheless, even after more than a decade of development, on-chip nonlinear combs still predominantly rely on the use of single-ring resonators. Recent theoretical investigations have shown that genera…
▽ More
On-chip generation of optical frequency combs using nonlinear ring resonators has opened the route to numerous novel applications of combs that were otherwise limited to mode-locked laser systems. Nevertheless, even after more than a decade of development, on-chip nonlinear combs still predominantly rely on the use of single-ring resonators. Recent theoretical investigations have shown that generating combs in a topological array of resonators can provide a new avenue to engineer comb spectra. Here, we experimentally demonstrate the generation of such a novel class of frequency combs, topological frequency combs, in a two-dimensional (2D) lattice of hundreds of nonlinear ring resonators. Specifically, the lattice hosts topological edge states that exhibit fabrication-robust linear dispersion and spatial confinement at the boundary of the lattice. Upon optical pum** of the topological edge band, these unique properties of the edge states lead to the generation of a nested frequency comb that is spectrally confined within the edge bands across $\approx$40 longitudinal modes. Moreover, using spatial imaging of our topological lattice, we confirm that light generated in the comb teeth is indeed spatially confined at the lattice edge, characteristic of linear topological systems. Our results bring together the fields of topological photonics and optical frequency combs, providing an opportunity to explore the interplay between topology and nonlinear systems in a platform compatible with commercially available nanofabrication processes.
△ Less
Submitted 8 April, 2024; v1 submitted 27 January, 2024;
originally announced January 2024.
-
PROXYQA: An Alternative Framework for Evaluating Long-Form Text Generation with Large Language Models
Authors:
Haochen Tan,
Zhijiang Guo,
Zhan Shi,
Lu Xu,
Zhili Liu,
Yunlong Feng,
Xiaoguang Li,
Yasheng Wang,
Lifeng Shang,
Qun Liu,
Linqi Song
Abstract:
Large Language Models (LLMs) have succeeded remarkably in understanding long-form contents. However, exploring their capability for generating long-form contents, such as reports and articles, has been relatively unexplored and inadequately assessed by existing benchmarks. The prevalent evaluation methods, which predominantly rely on crowdsourcing, are recognized for their labor-intensive nature a…
▽ More
Large Language Models (LLMs) have succeeded remarkably in understanding long-form contents. However, exploring their capability for generating long-form contents, such as reports and articles, has been relatively unexplored and inadequately assessed by existing benchmarks. The prevalent evaluation methods, which predominantly rely on crowdsourcing, are recognized for their labor-intensive nature and lack of efficiency, whereas automated metrics, such as the ROUGE score, demonstrate discordance with human judgment criteria. In this paper, we propose ProxyQA, an innovative framework dedicated to assessing long-text generation. ProxyQA comprises in-depth human-curated meta-questions spanning various domains, each accompanied by specific proxy-questions with pre-annotated answers. LLMs are tasked to generate extensive content in response to these meta-questions, by engaging an evaluator and incorporating the generated texts as contextual background, ProxyQA assesses the generated content's quality through the evaluator's accuracy in addressing the proxy-questions. We examine multiple LLMs, emphasizing ProxyQA's demanding nature as a high-quality assessment tool. Human evaluation demonstrates that the proxy-question method is notably self-consistent and aligns closely with human evaluative standards. The dataset and leaderboard is available at \url{https://proxy-qa.com}.
△ Less
Submitted 4 June, 2024; v1 submitted 26 January, 2024;
originally announced January 2024.
-
Observation of structures in the processes $e^+e^-\rightarrowωχ_{c1}$ and $ωχ_{c2}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (608 additional authors not shown)
Abstract:
We present measurements of the Born cross sections for the processes $e^+e^-\rightarrowωχ_{c1}$ and $ωχ_{c2}$ at center-of-mass energies $\sqrt{s}$ from 4.308 to 4.951 GeV. The measurements are performed with data samples corresponding to an integrated luminosity of 11.0 $\rm{fb}^{-1}$ collected with the BESIII detector operating at the BEPCII storage ring. Assuming the $e^+e^-\rightarrowωχ_{c2}$…
▽ More
We present measurements of the Born cross sections for the processes $e^+e^-\rightarrowωχ_{c1}$ and $ωχ_{c2}$ at center-of-mass energies $\sqrt{s}$ from 4.308 to 4.951 GeV. The measurements are performed with data samples corresponding to an integrated luminosity of 11.0 $\rm{fb}^{-1}$ collected with the BESIII detector operating at the BEPCII storage ring. Assuming the $e^+e^-\rightarrowωχ_{c2}$ signals come from a single resonance, the mass and width are determined to be $M=(4413.6\pm9.0\pm0.8)$ MeV/$c^2$ and $Γ=(110.5\pm15.0\pm2.9)$ MeV, respectively, which is consistent with the parameters of the well-established resonance $ψ(4415)$. In addition, we also use one single resonance to describe the $e^+e^-\rightarrowωχ_{c1}$ lineshape, and determine the mass and width to be $M=(4544.2\pm18.7\pm1.7)$ MeV/$c^2$ and $Γ=(116.1\pm33.5\pm1.7)$ MeV, respectively. The structure of this lineshape, observed for the first time, requires further understanding.
△ Less
Submitted 24 March, 2024; v1 submitted 26 January, 2024;
originally announced January 2024.
-
Study of $e^{+}e^{-}\rightarrowπ^{+}π^{-}π^{0}$ at $\sqrt{s}$ from 2.00 to 3.08 GeV at BESIII
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (608 additional authors not shown)
Abstract:
With the data samples taken at center-of-mass energies from 2.00 to 3.08 GeV with the BESIII detector at the BEPCII collider, a partial wave analysis on the $e^{+}e^{-}\rightarrowπ^{+}π^{-}π^{0}$ process is performed. The Born cross sections for $e^{+}e^{-}\rightarrowπ^{+}π^{-}π^{0}$ and its intermediate processes $e^{+}e^{-}\rightarrowρπ$ and $ρ(1450)π$ are measured as functions of $\sqrt{s}$. Th…
▽ More
With the data samples taken at center-of-mass energies from 2.00 to 3.08 GeV with the BESIII detector at the BEPCII collider, a partial wave analysis on the $e^{+}e^{-}\rightarrowπ^{+}π^{-}π^{0}$ process is performed. The Born cross sections for $e^{+}e^{-}\rightarrowπ^{+}π^{-}π^{0}$ and its intermediate processes $e^{+}e^{-}\rightarrowρπ$ and $ρ(1450)π$ are measured as functions of $\sqrt{s}$. The results for $e^{+}e^{-}\rightarrowπ^{+}π^{-}π^{0}$ are consistent with previous results measured with the initial state radiation method within one standard deviation, and improve the uncertainty by a factor of ten. By fitting the line shapes of the Born cross sections for the $e^{+}e^{-}\rightarrowρπ$ and $ρ(1450)π$, a structure with mass $M = 2119\pm11\pm15\ {\rm MeV}/c^2$ and width $Γ=69\pm30\pm5 {\rm MeV}$ is observed with a significance of $5.9σ$, where the first uncertainties are statistical and the second ones are systematic. This structure can be intepreteted as an excited $ω$ state.
△ Less
Submitted 26 January, 2024;
originally announced January 2024.
-
MoE-Infinity: Activation-Aware Expert Offloading for Efficient MoE Serving
Authors:
Leyang Xue,
Yao Fu,
Zhan Lu,
Luo Mai,
Mahesh Marina
Abstract:
This paper presents MoE-Infinity, a cost-efficient mixture-of-expert (MoE) serving system that realizes activation-aware expert offloading. MoE-Infinity features sequence-level expert activation tracing, a new approach adept at identifying sparse activations and capturing the temporal locality of MoE inference. By analyzing these traces, MoE-Infinity performs novel activation-aware expert prefetch…
▽ More
This paper presents MoE-Infinity, a cost-efficient mixture-of-expert (MoE) serving system that realizes activation-aware expert offloading. MoE-Infinity features sequence-level expert activation tracing, a new approach adept at identifying sparse activations and capturing the temporal locality of MoE inference. By analyzing these traces, MoE-Infinity performs novel activation-aware expert prefetching and caching, substantially reducing the latency overheads usually associated with offloading experts for improved cost performance. Extensive experiments in a cluster show that MoE-Infinity outperforms numerous existing systems and approaches, reducing latency by 4 - 20X and decreasing deployment costs by over 8X for various MoEs. MoE-Infinity's source code is publicly available at https://github.com/TorchMoE/MoE-Infinity
△ Less
Submitted 25 January, 2024;
originally announced January 2024.
-
ServerlessLLM: Locality-Enhanced Serverless Inference for Large Language Models
Authors:
Yao Fu,
Leyang Xue,
Yeqi Huang,
Andrei-Octavian Brabete,
Dmitrii Ustiugov,
Yuvraj Patel,
Luo Mai
Abstract:
This paper presents ServerlessLLM, a locality-enhanced serverless inference system for Large Language Models (LLMs). ServerlessLLM exploits the substantial capacity and bandwidth of storage and memory devices available on GPU servers, thereby reducing costly remote checkpoint downloads and achieving efficient checkpoint loading. ServerlessLLM achieves this through three main contributions: (i) fas…
▽ More
This paper presents ServerlessLLM, a locality-enhanced serverless inference system for Large Language Models (LLMs). ServerlessLLM exploits the substantial capacity and bandwidth of storage and memory devices available on GPU servers, thereby reducing costly remote checkpoint downloads and achieving efficient checkpoint loading. ServerlessLLM achieves this through three main contributions: (i) fast LLM checkpoint loading via a novel loading-optimized checkpoint format design, coupled with an efficient multi-tier checkpoint loading system; (ii) locality-driven LLM inference with live migration, which allows ServerlessLLM to effectively achieve locality-driven server allocation while preserving the low latency of ongoing LLM inference; and (iii) locality-aware server allocation, enabling ServerlessLLM to evaluate the status of each server in a cluster and effectively schedule model startup time to capitalize on local checkpoint placement. Our comprehensive experiments, which include microbenchmarks and real-world traces, show that ServerlessLLM surpasses state-of-the-art systems by 10 - 200X in latency performance when running various LLM inference workloads.
△ Less
Submitted 25 January, 2024;
originally announced January 2024.
-
Parameter-Efficient Conversational Recommender System as a Language Processing Task
Authors:
Mathieu Ravaut,
Hao Zhang,
Lu Xu,
Aixin Sun,
Yong Liu
Abstract:
Conversational recommender systems (CRS) aim to recommend relevant items to users by eliciting user preference through natural language conversation. Prior work often utilizes external knowledge graphs for items' semantic information, a language model for dialogue generation, and a recommendation module for ranking relevant items. This combination of multiple components suffers from a cumbersome t…
▽ More
Conversational recommender systems (CRS) aim to recommend relevant items to users by eliciting user preference through natural language conversation. Prior work often utilizes external knowledge graphs for items' semantic information, a language model for dialogue generation, and a recommendation module for ranking relevant items. This combination of multiple components suffers from a cumbersome training process, and leads to semantic misalignment issues between dialogue generation and item recommendation. In this paper, we represent items in natural language and formulate CRS as a natural language processing task. Accordingly, we leverage the power of pre-trained language models to encode items, understand user intent via conversation, perform item recommendation through semantic matching, and generate dialogues. As a unified model, our PECRS (Parameter-Efficient CRS), can be optimized in a single stage, without relying on non-textual metadata such as a knowledge graph. Experiments on two benchmark CRS datasets, ReDial and INSPIRED, demonstrate the effectiveness of PECRS on recommendation and conversation. Our code is available at: https://github.com/Ravoxsg/efficient_unified_crs.
△ Less
Submitted 24 February, 2024; v1 submitted 25 January, 2024;
originally announced January 2024.
-
Towards Autonomous Supply Chains: Definition, Characteristics, Conceptual Framework, and Autonomy Levels
Authors:
Liming Xu,
Stephen Mak,
Yaniv Proselkov,
Alexandra Brintrup
Abstract:
Recent global disruptions, such as the pandemic and geopolitical conflicts, have profoundly exposed vulnerabilities in traditional supply chains, requiring exploration of more resilient alternatives. Autonomous supply chains (ASCs) have emerged as a potential solution, offering increased visibility, flexibility, and resilience in turbulent trade environments. Despite discussions in industry and ac…
▽ More
Recent global disruptions, such as the pandemic and geopolitical conflicts, have profoundly exposed vulnerabilities in traditional supply chains, requiring exploration of more resilient alternatives. Autonomous supply chains (ASCs) have emerged as a potential solution, offering increased visibility, flexibility, and resilience in turbulent trade environments. Despite discussions in industry and academia over several years, ASCs lack well-established theoretical foundations. This paper addresses this research gap by presenting a formal definition of ASC along with its defining characteristics and auxiliary concepts. We propose a layered conceptual framework called the MIISI model. An illustrative case study focusing on the meat supply chain demonstrates an initial ASC implementation based on this conceptual model. Additionally, we introduce a seven-level supply chain autonomy reference model, delineating a trajectory towards achieving a full supply chain autonomy. Recognising that this work represents an initial endeavour, we emphasise the need for continued exploration in this emerging domain. We anticipate that this work will stimulate further research, both theoretical and technical, and contribute to the continual evolution of ASCs.
△ Less
Submitted 13 October, 2023;
originally announced January 2024.
-
A New Look at the Scalar Meson $f_0(500)$ via $D^+\to π^+π^-\ell^+ν_\ell$ Decays
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai,
X. Cai
, et al. (615 additional authors not shown)
Abstract:
Using $2.93~\mathrm{fb}^{-1}$ of $e^+e^-$ collision data collected with the BESIII detector at the center-of-mass energy of 3.773 GeV, we investigate the semileptonic decays $D^+\to π^+π^- \ell^+ν_\ell$ ($\ell=e$ and $μ$). The $D^+\to f_0(500)μ^+ν_μ$ decay is observed for the first time. By analyzing simultaneously the differential decay rates of $D^+\to f_0(500) μ^+ν_μ$ and…
▽ More
Using $2.93~\mathrm{fb}^{-1}$ of $e^+e^-$ collision data collected with the BESIII detector at the center-of-mass energy of 3.773 GeV, we investigate the semileptonic decays $D^+\to π^+π^- \ell^+ν_\ell$ ($\ell=e$ and $μ$). The $D^+\to f_0(500)μ^+ν_μ$ decay is observed for the first time. By analyzing simultaneously the differential decay rates of $D^+\to f_0(500) μ^+ν_μ$ and $D^+\to f_0(500) e^+ν_e$ in different $\ell^+ν_\ell$ four-momentum transfer intervals, the product of the relevant hadronic form factor $f^{f_0}_{+}(0)$ and the magnitude of the $c\to d$ Cabibbo-Kobayashi-Maskawa matrix element $|V_{cd}|$ is determined to be $f_{+}^{f_0} (0)|V_{cd}|=0.0787\pm0.0060_{\rm stat}\pm0.0033_{\rm syst}$ for the first time. With the input of $|V_{cd}|$ from the global fit in the standard model, we determine $f_{+}^{f_0} (0)=0.350\pm0.027_{\rm stat}\pm0.015_{\rm syst}$. The absolute branching fractions of $D^+\to f_0(500)_{(π^+π^-)}μ^+ν_μ$ and $D^+\to ρ^0_{(π^+π^-)} μ^+ν_μ$ are determined as $(0.72\pm0.13_{\rm stat}\pm0.10_{\rm syst})\times10^{-3}$ and $(1.64\pm0.13_{\rm stat}\pm0.11_{\rm syst})\times 10^{-3}$. Combining these results with those of previous BESIII measurements on their semielectronic counterparts from the same data sample, we test lepton flavor universality by measuring the branching fraction ratios ${\mathcal B}_{D^+\to ρ^0 μ^+ν_μ}/{\mathcal B}_{D^+\to ρ^0 e^+ν_e}=0.88\pm0.10$ and ${\mathcal B}_{D^+\to f_0(500) μ^+ν_μ}/{\mathcal B}_{D^+\to f_0(500) e^+ν_e}=1.14\pm0.28$, which are compatible with the standard model expectation.
△ Less
Submitted 4 February, 2024; v1 submitted 23 January, 2024;
originally announced January 2024.
-
Force sensing to reconstruct potential energy landscapes for cluttered large obstacle traversal
Authors:
Yaqing Wang,
Ling Xu,
Chen Li
Abstract:
Visual sensing of environmental geometry allows robots to use artificial potential fields to avoid sparse obstacles. Yet robots must further traverse cluttered large obstacles for applications like search and rescue through rubble and planetary exploration across Martain rocks. Recent studies discovered that to traverse cluttered large obstacles, multi-legged insects and insect-inspired robots mak…
▽ More
Visual sensing of environmental geometry allows robots to use artificial potential fields to avoid sparse obstacles. Yet robots must further traverse cluttered large obstacles for applications like search and rescue through rubble and planetary exploration across Martain rocks. Recent studies discovered that to traverse cluttered large obstacles, multi-legged insects and insect-inspired robots make strenuous transitions across locomotor modes with major changes in body orientation. When viewed on a potential energy landscape resulting from locomotor-obstacle physical interaction, these are barrier-crossing transitions across landscape basins. This potential energy landscape approach may provide a modeling framework for cluttered large obstacle traversal. Here, we take the next step toward this vision by testing whether force sensing allows the reconstruction of the potential energy landscape. We developed a cockroach-inspired, minimalistic robot capable of sensing obstacle contact forces and torques around its body as it propelled forward against a pair of cluttered grass-like beam obstacles. We performed measurements over many traverses with systematically varied body orientations. Despite the forces and torques not being fully conservative, they well-matched the potential energy landscape gradients and the landscape reconstructed from them well-matched ground truth. In addition, inspired by cockroach observations, we found that robot head oscillation during traversal further improved the accuracies of force sensing and landscape reconstruction. We still need to study how to reconstruct landscape during a single traverse, as in applications, robots have little chance to use multiple traverses to sample the environment systematically and how to find landscape saddles for least-effort transitions to traverse.
△ Less
Submitted 23 January, 2024;
originally announced January 2024.
-
ChatGraph: Chat with Your Graphs
Authors:
Yun Peng,
Sen Lin,
Qian Chen,
Lyu Xu,
Xiaojun Ren,
Yafei Li,
Jianliang Xu
Abstract:
Graph analysis is fundamental in real-world applications. Traditional approaches rely on SPARQL-like languages or clicking-and-dragging interfaces to interact with graph data. However, these methods either require users to possess high programming skills or support only a limited range of graph analysis functionalities. To address the limitations, we propose a large language model (LLM)-based fram…
▽ More
Graph analysis is fundamental in real-world applications. Traditional approaches rely on SPARQL-like languages or clicking-and-dragging interfaces to interact with graph data. However, these methods either require users to possess high programming skills or support only a limited range of graph analysis functionalities. To address the limitations, we propose a large language model (LLM)-based framework called ChatGraph. With ChatGraph, users can interact with graphs through natural language, making it easier to use and more flexible than traditional approaches. The core of ChatGraph lies in generating chains of graph analysis APIs based on the understanding of the texts and graphs inputted in the user prompts. To achieve this, ChatGraph consists of three main modules: an API retrieval module that searches for relevant APIs, a graph-aware LLM module that enables the LLM to comprehend graphs, and an API chain-oriented finetuning module that guides the LLM in generating API chains.
△ Less
Submitted 23 January, 2024;
originally announced January 2024.
-
Orion-14B: Open-source Multilingual Large Language Models
Authors:
Du Chen,
Yi Huang,
Xiaopu Li,
Yongqiang Li,
Yongqiang Liu,
Haihui Pan,
Leichao Xu,
Dacheng Zhang,
Zhipeng Zhang,
Kun Han
Abstract:
In this study, we introduce Orion-14B, a collection of multilingual large language models with 14 billion parameters. We utilize a data scheduling approach to train a foundational model on a diverse corpus of 2.5 trillion tokens, sourced from texts in English, Chinese, Japanese, Korean, and other languages. Additionally, we fine-tuned a series of models tailored for conversational applications and…
▽ More
In this study, we introduce Orion-14B, a collection of multilingual large language models with 14 billion parameters. We utilize a data scheduling approach to train a foundational model on a diverse corpus of 2.5 trillion tokens, sourced from texts in English, Chinese, Japanese, Korean, and other languages. Additionally, we fine-tuned a series of models tailored for conversational applications and other specific use cases. Our evaluation results demonstrate that Orion-14B achieves state-of-the-art performance across a broad spectrum of tasks. We make the Orion-14B model family and its associated code publicly accessible https://github.com/OrionStarAI/Orion, aiming to inspire future research and practical applications in the field.
△ Less
Submitted 20 January, 2024;
originally announced January 2024.
-
SuperCLUE-Math6: Graded Multi-Step Math Reasoning Benchmark for LLMs in Chinese
Authors:
Liang Xu,
Hang Xue,
Lei Zhu,
Kangkang Zhao
Abstract:
We introduce SuperCLUE-Math6(SC-Math6), a new benchmark dataset to evaluate the mathematical reasoning abilities of Chinese language models. SC-Math6 is designed as an upgraded Chinese version of the GSM8K dataset with enhanced difficulty, diversity, and application scope. It consists of over 2000 mathematical word problems requiring multi-step reasoning and providing natural language solutions. W…
▽ More
We introduce SuperCLUE-Math6(SC-Math6), a new benchmark dataset to evaluate the mathematical reasoning abilities of Chinese language models. SC-Math6 is designed as an upgraded Chinese version of the GSM8K dataset with enhanced difficulty, diversity, and application scope. It consists of over 2000 mathematical word problems requiring multi-step reasoning and providing natural language solutions. We propose an innovative scheme to quantify the reasoning capability of large models based on performance over problems with different reasoning steps. Experiments on 13 representative Chinese models demonstrate a clear stratification of reasoning levels, with top models like GPT-4 showing superior performance. SC-Math6 fills the gap in Chinese mathematical reasoning benchmarks and provides a comprehensive testbed to advance the intelligence of Chinese language models.
△ Less
Submitted 1 February, 2024; v1 submitted 22 January, 2024;
originally announced January 2024.
-
Prompt and nonprompt $ψ(2S)$ production in $p$Pb collisions at $\sqrt{s_{NN}}=8.16$ TeV
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
B. Adeva,
M. Adinolfi,
P. Adlarson,
H. Afsharnia,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
A. Alfonso Albero,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey
, et al. (1079 additional authors not shown)
Abstract:
The production of $ψ(2S)$ mesons in proton-lead collisions at a centre-of-mass energy per nucleon pair of $\sqrt{s_{NN}}=8.16$ TeV is studied with the LHCb detector using data corresponding to an integrated luminosity of 34 nb$^{-1}$. The prompt and nonprompt $ψ(2S)$ production cross-sections and the ratio of the $ψ(2S)$ to $J/ψ$ cross-section are measured as a function of the meson transverse mom…
▽ More
The production of $ψ(2S)$ mesons in proton-lead collisions at a centre-of-mass energy per nucleon pair of $\sqrt{s_{NN}}=8.16$ TeV is studied with the LHCb detector using data corresponding to an integrated luminosity of 34 nb$^{-1}$. The prompt and nonprompt $ψ(2S)$ production cross-sections and the ratio of the $ψ(2S)$ to $J/ψ$ cross-section are measured as a function of the meson transverse momentum and rapidity in the nucleon-nucleon centre-of-mass frame, together with forward-to-backward ratios and nuclear modification factors. The production of prompt $ψ(2S)$ is observed to be more suppressed compared to $pp$ collisions than the prompt $J/ψ$ production, while the nonprompt productions have similar suppression factors.
△ Less
Submitted 22 April, 2024; v1 submitted 20 January, 2024;
originally announced January 2024.
-
Susceptibility of Adversarial Attack on Medical Image Segmentation Models
Authors:
Zhongxuan Wang,
Leo Xu
Abstract:
The nature of deep neural networks has given rise to a variety of attacks, but little work has been done to address the effect of adversarial attacks on segmentation models trained on MRI datasets. In light of the grave consequences that such attacks could cause, we explore four models from the U-Net family and examine their responses to the Fast Gradient Sign Method (FGSM) attack. We conduct FGSM…
▽ More
The nature of deep neural networks has given rise to a variety of attacks, but little work has been done to address the effect of adversarial attacks on segmentation models trained on MRI datasets. In light of the grave consequences that such attacks could cause, we explore four models from the U-Net family and examine their responses to the Fast Gradient Sign Method (FGSM) attack. We conduct FGSM attacks on each of them and experiment with various schemes to conduct the attacks. In this paper, we find that medical imaging segmentation models are indeed vulnerable to adversarial attacks and that there is a negligible correlation between parameter size and adversarial attack success. Furthermore, we show that using a different loss function than the one used for training yields higher adversarial attack success, contrary to what the FGSM authors suggested. In future efforts, we will conduct the experiments detailed in this paper with more segmentation models and different attacks. We will also attempt to find ways to counteract the attacks by using model ensembles or special data augmentations. Our code is available at https://github.com/ZhongxuanWang/adv_attk
△ Less
Submitted 20 January, 2024;
originally announced January 2024.
-
Inference without Interference: Disaggregate LLM Inference for Mixed Downstream Workloads
Authors:
Cunchen Hu,
Heyang Huang,
Liangliang Xu,
Xusheng Chen,
Jiang Xu,
Shuang Chen,
Hao Feng,
Chenxi Wang,
Sa Wang,
Yungang Bao,
Ninghui Sun,
Yizhou Shan
Abstract:
Transformer-based large language model (LLM) inference serving is now the backbone of many cloud services. LLM inference consists of a prefill phase and a decode phase. However, existing LLM deployment practices often overlook the distinct characteristics of these phases, leading to significant interference. To mitigate interference, our insight is to carefully schedule and group inference request…
▽ More
Transformer-based large language model (LLM) inference serving is now the backbone of many cloud services. LLM inference consists of a prefill phase and a decode phase. However, existing LLM deployment practices often overlook the distinct characteristics of these phases, leading to significant interference. To mitigate interference, our insight is to carefully schedule and group inference requests based on their characteristics. We realize this idea in TetriInfer through three pillars. First, it partitions prompts into fixed-size chunks so that the accelerator always runs close to its computationsaturated limit. Second, it disaggregates prefill and decode instances so each can run independently. Finally, it uses a smart two-level scheduling algorithm augmented with predicted resource usage to avoid decode scheduling hotspots. Results show that TetriInfer improves time-to-first-token (TTFT), job completion time (JCT), and inference efficiency in turns of performance per dollar by a large margin, e.g., it uses 38% less resources all the while lowering average TTFT and average JCT by 97% and 47%, respectively.
△ Less
Submitted 20 January, 2024;
originally announced January 2024.
-
A New Creative Generation Pipeline for Click-Through Rate with Stable Diffusion Model
Authors:
Hao Yang,
Jianxin Yuan,
Shuai Yang,
Linhe Xu,
Shuo Yuan,
Yifan Zeng
Abstract:
In online advertising scenario, sellers often create multiple creatives to provide comprehensive demonstrations, making it essential to present the most appealing design to maximize the Click-Through Rate (CTR). However, sellers generally struggle to consider users preferences for creative design, leading to the relatively lower aesthetics and quantities compared to Artificial Intelligence (AI)-ba…
▽ More
In online advertising scenario, sellers often create multiple creatives to provide comprehensive demonstrations, making it essential to present the most appealing design to maximize the Click-Through Rate (CTR). However, sellers generally struggle to consider users preferences for creative design, leading to the relatively lower aesthetics and quantities compared to Artificial Intelligence (AI)-based approaches. Traditional AI-based approaches still face the same problem of not considering user information while having limited aesthetic knowledge from designers. In fact that fusing the user information, the generated creatives can be more attractive because different users may have different preferences. To optimize the results, the generated creatives in traditional methods are then ranked by another module named creative ranking model. The ranking model can predict the CTR score for each creative considering user features. However, the two above stages are regarded as two different tasks and are optimized separately. In this paper, we proposed a new automated Creative Generation pipeline for Click-Through Rate (CG4CTR) with the goal of improving CTR during the creative generation stage. Our contributions have 4 parts: 1) The inpainting mode in stable diffusion is firstly applied to creative generation task in online advertising scene. A self-cyclic generation pipeline is proposed to ensure the convergence of training. 2) Prompt model is designed to generate individualized creatives for different user groups, which can further improve the diversity and quality. 3) Reward model comprehensively considers the multimodal features of image and text to improve the effectiveness of creative ranking task, and it is also critical in self-cyclic pipeline. 4) The significant benefits obtained in online and offline experiments verify the significance of our proposed method.
△ Less
Submitted 16 January, 2024;
originally announced January 2024.
-
Hazard resistance-based spatiotemporal risk analysis for distribution network outages during hurricanes
Authors:
Luo Xu,
Ning Lin,
Dazhi Xi,
Kairui Feng,
H. Vincent Poor
Abstract:
Blackouts in recent decades show an increasing prevalence of power outages due to extreme weather events such as hurricanes. Precisely assessing the spatiotemporal outages in distribution networks, the most vulnerable part of power systems, is critical to enhance power system resilience. The Sequential Monte Carlo (SMC) simulation method is widely used for spatiotemporal risk analysis of power sys…
▽ More
Blackouts in recent decades show an increasing prevalence of power outages due to extreme weather events such as hurricanes. Precisely assessing the spatiotemporal outages in distribution networks, the most vulnerable part of power systems, is critical to enhance power system resilience. The Sequential Monte Carlo (SMC) simulation method is widely used for spatiotemporal risk analysis of power systems during extreme weather hazards. However, it is found here that the SMC method can lead to large errors by directly applying the fragility function or failure probability of system components in time-sequential analysis, particularly overestimating damages under evolving hazards with high-frequency sampling. To address this issue, a novel hazard resistance-based spatiotemporal risk analysis (HRSRA) method is proposed. This method converts the time-varying failure probability of a component into a hazard resistance as a time-invariant value during the simulation of evolving hazards. The proposed HRSRA provides an adaptive framework for incorporating high-spatiotemporal-resolution meteorology models into power outage simulations. By leveraging the geographic information system data of the power system and a physics-based hurricane wind field model, the superiority of the proposed method is validated using real-world time-series power outage data from Puerto Rico during Hurricane Fiona 2022.
△ Less
Submitted 18 January, 2024;
originally announced January 2024.
-
Communication-Efficient Personalized Federated Learning for Speech-to-Text Tasks
Authors:
Yichao Du,
Zhirui Zhang,
Linan Yue,
Xu Huang,
Yuqing Zhang,
Tong Xu,
Linli Xu,
Enhong Chen
Abstract:
To protect privacy and meet legal regulations, federated learning (FL) has gained significant attention for training speech-to-text (S2T) systems, including automatic speech recognition (ASR) and speech translation (ST). However, the commonly used FL approach (i.e., \textsc{FedAvg}) in S2T tasks typically suffers from extensive communication overhead due to multi-round interactions based on the wh…
▽ More
To protect privacy and meet legal regulations, federated learning (FL) has gained significant attention for training speech-to-text (S2T) systems, including automatic speech recognition (ASR) and speech translation (ST). However, the commonly used FL approach (i.e., \textsc{FedAvg}) in S2T tasks typically suffers from extensive communication overhead due to multi-round interactions based on the whole model and performance degradation caused by data heterogeneity among clients.To address these issues, we propose a personalized federated S2T framework that introduces \textsc{FedLoRA}, a lightweight LoRA module for client-side tuning and interaction with the server to minimize communication overhead, and \textsc{FedMem}, a global model equipped with a $k$-nearest-neighbor ($k$NN) classifier that captures client-specific distributional shifts to achieve personalization and overcome data heterogeneity. Extensive experiments based on Conformer and Whisper backbone models on CoVoST and GigaSpeech benchmarks show that our approach significantly reduces the communication overhead on all S2T tasks and effectively personalizes the global model to overcome data heterogeneity.
△ Less
Submitted 18 January, 2024;
originally announced January 2024.
-
R-Judge: Benchmarking Safety Risk Awareness for LLM Agents
Authors:
Tongxin Yuan,
Zhiwei He,
Lingzhong Dong,
Yiming Wang,
Ruijie Zhao,
Tian Xia,
Lizhen Xu,
Binglin Zhou,
Fangqi Li,
Zhuosheng Zhang,
Rui Wang,
Gongshen Liu
Abstract:
Large language models (LLMs) have exhibited great potential in autonomously completing tasks across real-world applications. Despite this, these LLM agents introduce unexpected safety risks when operating in interactive environments. Instead of centering on LLM-generated content safety in most prior studies, this work addresses the imperative need for benchmarking the behavioral safety of LLM agen…
▽ More
Large language models (LLMs) have exhibited great potential in autonomously completing tasks across real-world applications. Despite this, these LLM agents introduce unexpected safety risks when operating in interactive environments. Instead of centering on LLM-generated content safety in most prior studies, this work addresses the imperative need for benchmarking the behavioral safety of LLM agents within diverse environments. We introduce R-Judge, a benchmark crafted to evaluate the proficiency of LLMs in judging and identifying safety risks given agent interaction records. R-Judge comprises 162 records of multi-turn agent interaction, encompassing 27 key risk scenarios among 7 application categories and 10 risk types. It incorporates human consensus on safety with annotated safety labels and high-quality risk descriptions. Evaluation of 9 LLMs on R-Judge shows considerable room for enhancing the risk awareness of LLMs: The best-performing model, GPT-4, achieves 72.52% in contrast to the human score of 89.07%, while all other models score less than the random. Moreover, further experiments demonstrate that leveraging risk descriptions as environment feedback achieves substantial performance gains. With case studies, we reveal that correlated to parameter amount, risk awareness in open agent scenarios is a multi-dimensional capability involving knowledge and reasoning, thus challenging for current LLMs. R-Judge is publicly available at https://github.com/Lordog/R-Judge.
△ Less
Submitted 17 February, 2024; v1 submitted 18 January, 2024;
originally announced January 2024.
-
Deep Ensemble Shape Calibration: Multi-Field Post-hoc Calibration in Online Advertising
Authors:
Shuai Yang,
Hao Yang,
Zhuang Zou,
Linhe Xu,
Shuo Yuan,
Yifan Zeng
Abstract:
In the e-commerce advertising scenario, estimating the true probabilities (known as a calibrated estimate) on Click-Through Rate (CTR) and Conversion Rate (CVR) is critical. Previous research has introduced numerous solutions for addressing the calibration problem. These methods typically involve the training of calibrators using a validation set and subsequently applying these calibrators to corr…
▽ More
In the e-commerce advertising scenario, estimating the true probabilities (known as a calibrated estimate) on Click-Through Rate (CTR) and Conversion Rate (CVR) is critical. Previous research has introduced numerous solutions for addressing the calibration problem. These methods typically involve the training of calibrators using a validation set and subsequently applying these calibrators to correct the original estimated values during online inference.
However, what sets e-commerce advertising scenarios apart is the challenge of multi-field calibration. Multi-field calibration requires achieving calibration in each field. In order to achieve multi-field calibration, it is necessary to have a strong data utilization ability. Because the quantity of pCTR specified range for a single field-value (such as user ID and item ID) sample is relatively small, this makes the calibrator more difficult to train. However, existing methods have difficulty effectively addressing these issues.
To solve these problems, we propose a new method named Deep Ensemble Shape Calibration (DESC). In terms of business understanding and interpretability, we decompose multi-field calibration into value calibration and shape calibration. We introduce innovative basis calibration functions, which enhance both function expression capabilities and data utilization by combining these basis calibration functions. A significant advancement lies in the development of an allocator capable of allocating the most suitable calibrators to different estimation error distributions within diverse fields and values. We achieve significant improvements in both public and industrial datasets. In online experiments, we observe a +2.5% increase in CVR and +4.0% in GMV (Gross Merchandise Volume). Our code is now available at: https://github.com/HaoYang0123/DESC.
△ Less
Submitted 20 May, 2024; v1 submitted 17 January, 2024;
originally announced January 2024.
-
Measurement of Born cross section of $e^{+}e^{-}\rightarrowΣ^{+}\barΣ^{-}$ at center-of-mass energies between 3.510 and 4.951 GeV
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (632 additional authors not shown)
Abstract:
Using 24.1 fb$^{-1}$ of $e^{+}e^{-}$ collision data collected with the BESIII detector at the BEPCII collider, the Born cross sections and effective form factors of the $e^{+}e^{-}\rightarrowΣ^{+}\barΣ^{-}$ reaction are measured. The measurements are performed at center-of-mass energies ranging from 3.510 to 4.951 GeV. No significant evidence for the decay of the charmonium(-like) states,…
▽ More
Using 24.1 fb$^{-1}$ of $e^{+}e^{-}$ collision data collected with the BESIII detector at the BEPCII collider, the Born cross sections and effective form factors of the $e^{+}e^{-}\rightarrowΣ^{+}\barΣ^{-}$ reaction are measured. The measurements are performed at center-of-mass energies ranging from 3.510 to 4.951 GeV. No significant evidence for the decay of the charmonium(-like) states, $ψ(3770)$, $ψ(4040)$, $ψ(4160)$, $Y(4230)$, $Y(4360)$, $ψ(4415)$, and $Y(4660)$, into a $Σ^{+}\barΣ^{-}$ final state is observed. Consequently, upper limits for the products of the branching fractions and the electronic partial widths at the 90% confidence level are reported for these decays.
△ Less
Submitted 6 May, 2024; v1 submitted 10 January, 2024;
originally announced January 2024.
-
First measurements of the absolute branching fraction of $Λ_{c}(2625)^{+}\to Λ^{+}_{c}π^+π^-$ and upper limit on $Λ_{c}(2595)^{+}\to Λ^{+}_{c}π^+π^-$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko
, et al. (603 additional authors not shown)
Abstract:
The absolute branching fraction of the decay $Λ_{c}(2625)^{+}\to Λ^{+}_{c}π^+π^-$ is measured for the first time to be $(50.7 \pm 5.0_{\rm{stat.}} \pm 4.9_{\rm{syst.}} )\%$ with 368.48 pb$^{-1}$ of $e^+e^-$ collision data collected by the BESIII detector at the center-of-mass energies of $\sqrt{s} = 4.918$ and $4.950$ GeV. This result is lower than the naive prediction of 67\%, obtained from isosp…
▽ More
The absolute branching fraction of the decay $Λ_{c}(2625)^{+}\to Λ^{+}_{c}π^+π^-$ is measured for the first time to be $(50.7 \pm 5.0_{\rm{stat.}} \pm 4.9_{\rm{syst.}} )\%$ with 368.48 pb$^{-1}$ of $e^+e^-$ collision data collected by the BESIII detector at the center-of-mass energies of $\sqrt{s} = 4.918$ and $4.950$ GeV. This result is lower than the naive prediction of 67\%, obtained from isospin symmetry, by more than $2σ$, thereby indicating that the novel mechanism referred to as the \textit{threshold effect}, proposed for the strong decays of $Λ_{c}(2595)^{+}$, also applies to $Λ_{c}(2625)^{+}$. This measurement is necessary to obtain the coupling constants for the transitions between $s$-wave and $p$-wave charmed baryons in heavy hadron chiral perturbation theory. In addition, we search for the decay $Λ_{c}(2595)^{+}\to Λ^{+}_{c}π^+π^-$. No significant signal is observed, and the upper limit on its branching fraction is determined to be 80.8\% at the 90\% confidence level.
△ Less
Submitted 17 January, 2024;
originally announced January 2024.
-
Improved measurements of the Dalitz decays $η/η'\rightarrowγe^{+}e^{-}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (618 additional authors not shown)
Abstract:
Based on a data sample of 10 billion $J/ψ$ events collected with the BESIII detector, improved measurements of the Dalitz decays $η/η'\rightarrowγe^+e^-$ are performed, where the $η$ and $η'$ are produced through the radiative decays $J/ψ\rightarrowγη/η'$. The branching fractions of $η\rightarrowγe^+e^-$ and $η'\rightarrowγe^+e^-$ are measured to be $(7.07 \pm 0.05 \pm 0.23)\times10^{-3}$ and…
▽ More
Based on a data sample of 10 billion $J/ψ$ events collected with the BESIII detector, improved measurements of the Dalitz decays $η/η'\rightarrowγe^+e^-$ are performed, where the $η$ and $η'$ are produced through the radiative decays $J/ψ\rightarrowγη/η'$. The branching fractions of $η\rightarrowγe^+e^-$ and $η'\rightarrowγe^+e^-$ are measured to be $(7.07 \pm 0.05 \pm 0.23)\times10^{-3}$ and $(4.83\pm0.07\pm0.14)\times10^{-4}$, respectively. Within the single pole model, the parameter of electromagnetic transition form factor for $η\rightarrowγe^+e^-$ is determined to be $Λ_η=(0.749 \pm 0.027 \pm 0.007)~ {\rm GeV}/c^{2}$. Within the multi-pole model, we extract the electromagnetic transition form factors for $η'\rightarrowγe^+e^-$ to be $Λ_{η'} = (0.802 \pm 0.007\pm 0.008)~ {\rm GeV}/c^{2}$ and $γ_{η'} = (0.113\pm0.010\pm0.002)~ {\rm GeV}/c^{2}$. The results are consistent with both theoretical predictions and previous measurements. The characteristic sizes of the interaction regions for the $η$ and $η'$ are calculated to be $(0.645 \pm 0.023 \pm 0.007 )~ {\rm fm}$ and $(0.596 \pm 0.005 \pm 0.006)~ {\rm fm}$, respectively. In addition, we search for the dark photon in $η/η^\prime\rightarrowγe^{+}e^{-}$, and the upper limits of the branching fractions as a function of the dark photon are given at 90\% confidence level.
△ Less
Submitted 5 April, 2024; v1 submitted 17 January, 2024;
originally announced January 2024.
-
An Improved Virtual Force Approach for UAV Deployment and Resource Allocation in Emergency Communications
Authors:
Hongying Guo,
Li Wang,
Ruoguang Li,
Luyang Hou,
Lianming Xu,
Aiguo Fei
Abstract:
In this paper, we consider an unmanned aerial vehicle (UAV)-enabled emergency communication system, which establishes temporary communication link with users equipment (UEs) in a typical disaster environment with mountainous forest and obstacles. Towards this end, a joint deployment, power allocation, and user association optimization problem is formulated to maximize the total transmission rate,…
▽ More
In this paper, we consider an unmanned aerial vehicle (UAV)-enabled emergency communication system, which establishes temporary communication link with users equipment (UEs) in a typical disaster environment with mountainous forest and obstacles. Towards this end, a joint deployment, power allocation, and user association optimization problem is formulated to maximize the total transmission rate, while considering the demand of each UE and the disaster environment characteristics. Then, an alternating optimization algorithm is proposed by integrating coalition game and virtual force approach which captures the impact of the demand priority of UEs and the obstacles to the flight path and consumed power. Simulation results demonstrate that the computation time consumed by our proposed algorithm is only $5.6\%$ of the traditional heuristic algorithms, which validates its effectiveness in disaster scenarios.
△ Less
Submitted 17 January, 2024;
originally announced January 2024.
-
First study of antihyperon-nucleon scattering $\barΛp\rightarrow\barΛp$ and measurement of $Λp\rightarrowΛp$ cross section
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (634 additional authors not shown)
Abstract:
Using $(10.087\pm0.044)\times10^{9}$ $J/ψ$ events collected with the BESIII detector at the BEPCII storage ring, the processes $Λp\rightarrowΛp$ and $\barΛp\rightarrow\barΛp$ are studied, where the $Λ/\barΛ$ baryons are produced in the process $J/ψ\rightarrowΛ\barΛ$ and the protons are the hydrogen nuclei in the cooling oil of the beam pipe. Clear signals are observed for the two reactions. The cr…
▽ More
Using $(10.087\pm0.044)\times10^{9}$ $J/ψ$ events collected with the BESIII detector at the BEPCII storage ring, the processes $Λp\rightarrowΛp$ and $\barΛp\rightarrow\barΛp$ are studied, where the $Λ/\barΛ$ baryons are produced in the process $J/ψ\rightarrowΛ\barΛ$ and the protons are the hydrogen nuclei in the cooling oil of the beam pipe. Clear signals are observed for the two reactions. The cross sections in $-0.9\leq\rm{cos}θ_{Λ/\barΛ}\leq0.9$ are measured to be $σ(Λp\rightarrowΛp)=(12.2\pm1.6_{\rm{stat}}\pm1.1_{\rm{sys}})$ mb and $σ(\barΛ p\rightarrow\barΛ p)=(17.5\pm2.1_{\rm{stat}}\pm1.6_{\rm{sys}})$ mb at the $Λ/\barΛ$ momentum of $1.074$ GeV/$c$ within a range of $\pm0.017$ GeV/$c$, where the $θ_{Λ/\barΛ}$ are the scattering angles of the $Λ/\barΛ$ in the $Λp/\barΛp$ rest frames. Furthermore, the differential cross sections of the two reactions are also measured, where there is a slight tendency of forward scattering for $Λp\rightarrowΛp$, and a strong forward peak for $\barΛp\rightarrow\barΛp$. We present an approach to extract the total elastic cross sections by extrapolation. The study of $\barΛp\rightarrow\barΛp$ represents the first study of antihyperon-nucleon scattering, and these new measurements will serve as important inputs for the theoretical understanding of the (anti)hyperon-nucleon interaction.
△ Less
Submitted 18 May, 2024; v1 submitted 17 January, 2024;
originally announced January 2024.
-
Observation of $ψ(3686) \to Ω^- K^+ \barΞ^0 $+c.c
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (630 additional authors not shown)
Abstract:
Using $(27.12 \pm 0.14) \times 10^{8}$ $ψ(3686)$ events collected with the BESIII detector at BEPCII, the decay of $ψ(3686) \to Ω^- K^+ \barΞ^0 +c.c.$ is observed for the first time. The branching fraction of this decay is measured to be $\mathcal{B}_{ψ(3686) \to Ω^- K^+ \barΞ^0 +c.c.}=(2.78 \pm 0.40 \pm 0.18 ) \times 10^{-6}$, where the first uncertainty is statistical and the second is systemati…
▽ More
Using $(27.12 \pm 0.14) \times 10^{8}$ $ψ(3686)$ events collected with the BESIII detector at BEPCII, the decay of $ψ(3686) \to Ω^- K^+ \barΞ^0 +c.c.$ is observed for the first time. The branching fraction of this decay is measured to be $\mathcal{B}_{ψ(3686) \to Ω^- K^+ \barΞ^0 +c.c.}=(2.78 \pm 0.40 \pm 0.18 ) \times 10^{-6}$, where the first uncertainty is statistical and the second is systematic. Possible baryon excited states are searched for in this decay, but no evident intermediate state is observed with the current sample size.
△ Less
Submitted 15 April, 2024; v1 submitted 16 January, 2024;
originally announced January 2024.
-
Emergency Localization for Mobile Ground Users: An Adaptive UAV Trajectory Planning Method
Authors:
Zhihao Zhu,
Jiafan He,
Luyang Hou,
Lianming Xu,
Wendi Zhu,
Li Wang
Abstract:
In emergency search and rescue scenarios, the quick location of trapped people is essential. However, disasters can render the Global Positioning System (GPS) unusable. Unmanned aerial vehicles (UAVs) with localization devices can serve as mobile anchors due to their agility and high line-of-sight (LoS) probability. Nonetheless, the number of available UAVs during the initial stages of disaster re…
▽ More
In emergency search and rescue scenarios, the quick location of trapped people is essential. However, disasters can render the Global Positioning System (GPS) unusable. Unmanned aerial vehicles (UAVs) with localization devices can serve as mobile anchors due to their agility and high line-of-sight (LoS) probability. Nonetheless, the number of available UAVs during the initial stages of disaster relief is limited, and innovative methods are needed to quickly plan UAV trajectories to locate non-uniformly distributed dynamic targets while ensuring localization accuracy. To address this challenge, we design a single UAV localization method without hovering, use the maximum likelihood estimation (MLE) method to estimate the location of mobile users and define the upper bound of the localization error by considering users' movement.Combining this localization method and localization error-index, we utilize the enhanced particle swarm optimization (EPSO) algorithm and edge access strategy to develop a low complexity localization-oriented adaptive trajectory planning algorithm. Simulation results demonstrate that our method outperforms other baseline algorithms, enabling faster localization without compromising localization accuracy.
△ Less
Submitted 14 January, 2024;
originally announced January 2024.