-
Co-benefits of Agricultural Diversification and Technology for Food and Nutrition Security in China
Authors:
Thomas Cherico Wanger,
Estelle Raveloaritiana,
Siyan Zeng,
Haixiu Gao,
Xueqing He,
Yiwen Shao,
Panlong Wu,
Kris A. G. Wyckhuys,
Wenwu Zhou,
Yi Zou,
Zengrong Zhu,
Ling Li,
Haiyan Cen,
Yunhui Liu,
Shenggen Fan
Abstract:
China is the leading crop producer and has successfully implemented sustainable development programs related to agriculture. Sustainable agriculture has been promoted to achieve national food security targets such as food self-sufficiency through the well-facilitated farmland construction (WFFC) approach. The WFFC is introduced in Chinas current national 10-year plan to consolidate farmlands into…
▽ More
China is the leading crop producer and has successfully implemented sustainable development programs related to agriculture. Sustainable agriculture has been promoted to achieve national food security targets such as food self-sufficiency through the well-facilitated farmland construction (WFFC) approach. The WFFC is introduced in Chinas current national 10-year plan to consolidate farmlands into large and simplified production areas to maximise automation, and improve soil fertility and productivity. However, research suggests that diversified and smaller farms faciliate ecosystem services, can improve yield resilience, defuse human health threats, and increase farm profitability. Currently, WFFC has not considered ecological farmland improvements and it may miss long-term environmental benefits including ecosystem service preservation conducive to yields. Moreover, the nutritional status in China has changed in recent decades with undernutrition being dramatically reduced, but the prevalence of overweight, obesity, and chronic diseases being increased. While a strategic choice and management of crop and livestock species can improve nutrition, the environmental and production benefits of agricultural diversification are currently not well interlinked with Chinas food and nutrition security discussions. Lastly, the role of agricultural technology for socioeconomic benefits and the link with diversified agricultural production may provide vast benefits for food security. Here, we focus on the opportunities and co-benefits of agricultural diversification and technology innovations to advance food and nutrition security in China through ecosystem service and yield benefits. Our applied five-point research agenda can provide evidence-based opportunities to support China in reaching its ambitious food security targets through agricultural diversification with global ramifications.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Fast and Efficient: Mask Neural Fields for 3D Scene Segmentation
Authors:
Zihan Gao,
Lingling Li,
Licheng Jiao,
Fang Liu,
Xu Liu,
Wen** Ma,
Yuwei Guo,
Shuyuan Yang
Abstract:
Understanding 3D scenes is a crucial challenge in computer vision research with applications spanning multiple domains. Recent advancements in distilling 2D vision-language foundation models into neural fields, like NeRF and 3DGS, enables open-vocabulary segmentation of 3D scenes from 2D multi-view images without the need for precise 3D annotations. While effective, however, the per-pixel distilla…
▽ More
Understanding 3D scenes is a crucial challenge in computer vision research with applications spanning multiple domains. Recent advancements in distilling 2D vision-language foundation models into neural fields, like NeRF and 3DGS, enables open-vocabulary segmentation of 3D scenes from 2D multi-view images without the need for precise 3D annotations. While effective, however, the per-pixel distillation of high-dimensional CLIP features introduces ambiguity and necessitates complex regularization strategies, adding inefficiencies during training. This paper presents MaskField, which enables fast and efficient 3D open-vocabulary segmentation with neural fields under weak supervision. Unlike previous methods, MaskField distills masks rather than dense high-dimensional CLIP features. MaskFields employ neural fields as binary mask generators and supervise them with masks generated by SAM and classified by coarse CLIP features. MaskField overcomes the ambiguous object boundaries by naturally introducing SAM segmented object shapes without extra regularization during training. By circumventing the direct handling of high-dimensional CLIP features during training, MaskField is particularly compatible with explicit scene representations like 3DGS. Our extensive experiments show that MaskField not only surpasses prior state-of-the-art methods but also achieves remarkably fast convergence, outperforming previous methods with just 5 minutes of training. We hope that MaskField will inspire further exploration into how neural fields can be trained to comprehend 3D scenes from 2D models.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Orbital origin of magnetic moment enhancement induced by charge density wave in kagome FeGe
Authors:
Shulun Han,
Linyang Li,
Chi Sin Tang,
Qi Wang,
Lingfeng Zhang,
Caozheng Diao,
Mingwen Zhao,
Shuo Sun,
Lijun Tian,
Mark B. H. Breese,
Chuanbing Cai,
Milorad V. Milosevic,
Yanpeng Qi,
Andrew T. S. Wee,
Xinmao Yin
Abstract:
Interactions among various electronic states such as CDW, magnetism, and superconductivity are of high significance in strongly correlated systems. While significant progress has been made in understanding the relationship between CDW and superconductivity, the interplay between CDW and magnetic order remains largely elusive. Kagome lattices, which intertwine nontrivial topology, charge order, and…
▽ More
Interactions among various electronic states such as CDW, magnetism, and superconductivity are of high significance in strongly correlated systems. While significant progress has been made in understanding the relationship between CDW and superconductivity, the interplay between CDW and magnetic order remains largely elusive. Kagome lattices, which intertwine nontrivial topology, charge order, and magnetism, offer an ideal platform for such studies. The kagome magnet FeGe, hosting the unique coupling between CDW and magnetism, has recently garnered considerable attention in that respect. Here we reveal the significant role of the orbital coupling effect during the CDW phase transition, highlighting the orbital origin of the magnetic moment enhancement in FeGe. Our X ray absorption experiments and first principles calculations illuminate the temperature dependent behavior of Fe3d_Ge4p orbital hybridization and corroborate its pivotal impact on the magnetic properties of FeGe. These findings introduce an orbital dimension to the correlation between charge and magnetic degrees of freedom, advancing our understanding of the intriguing quantum phases resulting from this interplay.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Measurement of the integrated luminosity of data samples collected during 2019-2022 by the Belle II experiment
Authors:
The Belle II Collaboration,
I. Adachi,
L. Aggarwal,
H. Ahmed,
J. K. Ahn,
H. Aihara,
N. Akopov,
A. Aloisio,
N. Althubiti,
N. Anh Ky,
D. M. Asner,
H. Atmacan,
T. Aushev,
V. Aushev,
M. Aversano,
R. Ayad,
V. Babu,
H. Bae,
S. Bahinipati,
P. Bambade,
Sw. Banerjee,
M. Barrett,
J. Baudot,
A. Baur,
A. Beaubien
, et al. (382 additional authors not shown)
Abstract:
A series of data samples was collected with the Belle II detector at the SuperKEKB collider from March 2019 to June 2022. We determine the integrated luminosities of these data samples using three distinct methodologies involving Bhabha ($e^+e^- \to e^+e^-(nγ)$), digamma ($e^+e^- \to γγ(nγ)$), and dimuon ($e^+e^- \to μ^+ μ^- (nγ)$) events. The total integrated luminosity obtained with Bhabha, diga…
▽ More
A series of data samples was collected with the Belle II detector at the SuperKEKB collider from March 2019 to June 2022. We determine the integrated luminosities of these data samples using three distinct methodologies involving Bhabha ($e^+e^- \to e^+e^-(nγ)$), digamma ($e^+e^- \to γγ(nγ)$), and dimuon ($e^+e^- \to μ^+ μ^- (nγ)$) events. The total integrated luminosity obtained with Bhabha, digamma, and dimuon events is (426.52 $\pm$ 0.03 $\pm$ 2.48)~fb$^{-1}$, (427.32 $\pm$ 0.03 $\pm$ 2.56)~fb$^{-1}$, and (424.84 $\pm$ 0.04 $\pm$ 3.88)~fb$^{-1}$, where the first uncertainties are statistical and the second are systematic. The resulting total integrated luminosity obtained from the combination of the three methods is (426.88 $\pm$ 1.93)~fb$^{-1}$.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
FedEx: Expediting Federated Learning over Heterogeneous Mobile Devices by Overlap** and Participant Selection
Authors:
Jiaxiang Geng,
Boyu Li,
Xiaoqi Qin,
Yixuan Li,
Liang Li,
Yanzhao Hou,
Miao Pan
Abstract:
Training latency is critical for the success of numerous intrigued applications ignited by federated learning (FL) over heterogeneous mobile devices. By revolutionarily overlap** local gradient transmission with continuous local computing, FL can remarkably reduce its training latency over homogeneous clients, yet encounter severe model staleness, model drifts, memory cost and straggler issues i…
▽ More
Training latency is critical for the success of numerous intrigued applications ignited by federated learning (FL) over heterogeneous mobile devices. By revolutionarily overlap** local gradient transmission with continuous local computing, FL can remarkably reduce its training latency over homogeneous clients, yet encounter severe model staleness, model drifts, memory cost and straggler issues in heterogeneous environments. To unleash the full potential of overlap**, we propose, FedEx, a novel \underline{fed}erated learning approach to \underline{ex}pedite FL training over mobile devices under data, computing and wireless heterogeneity. FedEx redefines the overlap** procedure with staleness ceilings to constrain memory consumption and make overlap** compatible with participation selection (PS) designs. Then, FedEx characterizes the PS utility function by considering the latency reduced by overlap**, and provides a holistic PS solution to address the straggler issue. FedEx also introduces a simple but effective metric to trigger overlap**, in order to avoid model drifts. Experimental results show that compared with its peer designs, FedEx demonstrates substantial reductions in FL training latency over heterogeneous mobile devices with limited memory cost.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
FoldGPT: Simple and Effective Large Language Model Compression Scheme
Authors:
Songwei Liu,
Chao Zeng,
Lianqiang Li,
Chenqian Yan,
Lean Fu,
Xing Mei,
Fangmin Chen
Abstract:
The demand for deploying large language models(LLMs) on mobile devices continues to increase, driven by escalating data security concerns and cloud costs. However, network bandwidth and memory limitations pose challenges for deploying billion-level models on mobile devices. In this study, we investigate the outputs of different layers across various scales of LLMs and found that the outputs of mos…
▽ More
The demand for deploying large language models(LLMs) on mobile devices continues to increase, driven by escalating data security concerns and cloud costs. However, network bandwidth and memory limitations pose challenges for deploying billion-level models on mobile devices. In this study, we investigate the outputs of different layers across various scales of LLMs and found that the outputs of most layers exhibit significant similarity. Moreover, this similarity becomes more pronounced as the model size increases, indicating substantial redundancy in the depth direction of the LLMs. Based on this observation, we propose an efficient model volume compression strategy, termed FoldGPT, which combines block removal and block parameter sharing.This strategy consists of three parts: (1) Based on the learnable gating parameters, we determine the block importance ranking while modeling the coupling effect between blocks. Then we delete some redundant layers based on the given removal rate. (2) For the retained blocks, we apply a specially designed group parameter sharing strategy, where blocks within the same group share identical weights, significantly compressing the number of parameters and slightly reducing latency overhead. (3) After sharing these Blocks, we "cure" the mismatch caused by sparsity with a minor amount of fine-tuning and introduce a tail-layer distillation strategy to improve the performance. Experiments demonstrate that FoldGPT outperforms previous state-of-the-art(SOTA) methods in efficient model compression, demonstrating the feasibility of achieving model lightweighting through straightforward block removal and parameter sharing.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
Study of $χ_{bJ}(2P)\toωΥ(1S)$ at Belle
Authors:
Z. S. Stottler,
T. K. Pedlar,
B. G. Fulsom,
I. Adachi,
K. Adamczyk,
H. Aihara,
S. Al Said,
D. M. Asner,
H. Atmacan,
T. Aushev,
R. Ayad,
V. Babu,
Sw. Banerjee,
M. Bauer,
P. Behera,
K. Belous,
J. Bennett,
F. Bernlochner,
M. Bessner,
T. Bilka,
D. Biswas,
A. Bobrov,
D. Bodrov,
G. Bonvicini,
J. Borah
, et al. (159 additional authors not shown)
Abstract:
We report a study of the hadronic transitions $χ_{bJ}(2P)\toωΥ(1S)$, with $ω\toπ^{+}π^{-}π^{0}$, using $28.2\times10^6~Υ(3S)$ mesons recorded by the Belle detector. We present the first evidence for the near--threshold transition $χ_{b0}(2P)\toωΥ(1S)$, the analog of the charm sector decay $χ_{c1}(3872)\toωJ/ψ$, with a branching fraction of…
▽ More
We report a study of the hadronic transitions $χ_{bJ}(2P)\toωΥ(1S)$, with $ω\toπ^{+}π^{-}π^{0}$, using $28.2\times10^6~Υ(3S)$ mesons recorded by the Belle detector. We present the first evidence for the near--threshold transition $χ_{b0}(2P)\toωΥ(1S)$, the analog of the charm sector decay $χ_{c1}(3872)\toωJ/ψ$, with a branching fraction of $\mathcal{B}\big(χ_{b0}(2P)\toωΥ(1S)\big) = \big(0.55\pm0.19\pm0.07\big)\%$. We also obtain branching fractions of $\mathcal{B}\big(χ_{b1}(2P)\toωΥ(1S)\big) = \big(2.39{}^{+0.20}_{-0.19}\pm0.24\big)\%$ and $\mathcal{B}\big(χ_{b2}(2P)\toωΥ(1S)\big) = \big(0.47{}^{+0.13}_{-0.12}\pm0.06\big)\%$, confirming the measurement of the $ω$ transitions of the $J=1,2~P$--wave states. The ratio for the $J=2$ to $J=1$ transitions is also measured and found to differ by 3.3 standard deviations from the expected value in the QCD multipole expansion.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
IVCA: Inter-Relation-Aware Video Complexity Analyzer
Authors:
Junqi Liao,
Yao Li,
Zhuoyuan Li,
Li Li,
Dong Liu
Abstract:
To meet the real-time analysis requirements of video streaming applications, we propose an inter-relation-aware video complexity analyzer (IVCA) as an extension to VCA. The IVCA addresses the limitation of VCA by considering inter-frame relations, namely motion and reference structure. First, we enhance the accuracy of temporal features by introducing feature-domain motion estimation into the IVCA…
▽ More
To meet the real-time analysis requirements of video streaming applications, we propose an inter-relation-aware video complexity analyzer (IVCA) as an extension to VCA. The IVCA addresses the limitation of VCA by considering inter-frame relations, namely motion and reference structure. First, we enhance the accuracy of temporal features by introducing feature-domain motion estimation into the IVCA. Next, drawing inspiration from the hierarchical reference structure in codecs, we design layer-aware weights to adjust the majorities of frame complexity in different layers. Additionally, we expand the scope of temporal features by considering frames that be referred to, rather than relying solely on the previous frame. Experimental results show the significant improvement in complexity estimation accuracy achieved by IVCA, with minimal time complexity increase.
△ Less
Submitted 28 June, 2024;
originally announced July 2024.
-
Observation of the Electromagnetic Dalitz Transition $h_c \rightarrow e^+e^-η_c$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
S. Ahmed,
M. Albrecht,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
X. H. Bai,
Y. Bai,
O. Bakina,
R. Baldini Ferroli,
I. Balossino,
Y. Ban,
K. Begzsuren,
N. Berger,
M. Bertani,
D. Bettoni,
F. Bianchi,
J. Bloms,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (495 additional authors not shown)
Abstract:
Using $(27.12\pm 0.14)\times10^8$ $ψ(3686)$ decays and data samples of $e^+e^-$ collisions with $\sqrt{s}$ from 4.130 to 4.780~GeV collected with the BESIII detector, we report the first observation of the electromagnetic Dalitz transition $h_c\to e^+e^-η_c$ with a statistical significance of $5.4σ$. We measure the ratio of the branching fractions…
▽ More
Using $(27.12\pm 0.14)\times10^8$ $ψ(3686)$ decays and data samples of $e^+e^-$ collisions with $\sqrt{s}$ from 4.130 to 4.780~GeV collected with the BESIII detector, we report the first observation of the electromagnetic Dalitz transition $h_c\to e^+e^-η_c$ with a statistical significance of $5.4σ$. We measure the ratio of the branching fractions $\frac{\mathcal{B}(h_c\rightarrow e^+e^-η_c)}{\mathcal{B}(h_c\rightarrow γη_c)}$ separately for the $h_c$ samples produced via $ψ(3686)\toπ^0h_c$ and $e^+e^-\toπ^+π^-h_c$. The average ratio is determined to be $(0.59\pm0.10(\text{stat.})\pm0.04(\text{syst.}))\%$, where the uncertainty includes both statistical and systematic components.
△ Less
Submitted 28 June, 2024;
originally announced July 2024.
-
Parallax-tolerant Image Stitching via Segmentation-guided Multi-homography War**
Authors:
Tianli Liao,
Ce Wang,
Lei Li,
Guangen Liu,
Nan Li
Abstract:
Large parallax between images is an intractable issue in image stitching. Various war**-based methods are proposed to address it, yet the results are unsatisfactory. In this paper, we propose a novel image stitching method using multi-homography war** guided by image segmentation. Specifically, we leverage the Segment Anything Model to segment the target image into numerous contents and partit…
▽ More
Large parallax between images is an intractable issue in image stitching. Various war**-based methods are proposed to address it, yet the results are unsatisfactory. In this paper, we propose a novel image stitching method using multi-homography war** guided by image segmentation. Specifically, we leverage the Segment Anything Model to segment the target image into numerous contents and partition the feature points into multiple subsets via the energy-based multi-homography fitting algorithm. The multiple subsets of feature points are used to calculate the corresponding multiple homographies. For each segmented content in the overlap** region, we select its best-fitting homography with the lowest photometric error. For each segmented content in the non-overlap** region, we calculate a weighted combination of the linearized homographies. Finally, the target image is warped via the best-fitting homographies to align with the reference image, and the final panorama is generated via linear blending. Comprehensive experimental results on the public datasets demonstrate that our method provides the best alignment accuracy by a large margin, compared with the state-of-the-art methods. The source code is available at https://github.com/tlliao/multi-homo-warp.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
Deep Convolutional Neural Networks Meet Variational Shape Compactness Priors for Image Segmentation
Authors:
Kehui Zhang,
Lingfeng Li,
Hao Liu,
**g Yuan,
Xue-Cheng Tai
Abstract:
Shape compactness is a key geometrical property to describe interesting regions in many image segmentation tasks. In this paper, we propose two novel algorithms to solve the introduced image segmentation problem that incorporates a shape-compactness prior. Existing algorithms for such a problem often suffer from computational inefficiency, difficulty in reaching a local minimum, and the need to fi…
▽ More
Shape compactness is a key geometrical property to describe interesting regions in many image segmentation tasks. In this paper, we propose two novel algorithms to solve the introduced image segmentation problem that incorporates a shape-compactness prior. Existing algorithms for such a problem often suffer from computational inefficiency, difficulty in reaching a local minimum, and the need to fine-tune the hyperparameters. To address these issues, we propose a novel optimization model along with its equivalent primal-dual model and introduce a new optimization algorithm based on primal-dual threshold dynamics (PD-TD). Additionally, we relax the solution constraint and propose another novel primal-dual soft threshold-dynamics algorithm (PD-STD) to achieve superior performance. Based on the variational explanation of the sigmoid layer, the proposed PD-STD algorithm can be integrated into Deep Neural Networks (DNNs) to enforce compact regions as image segmentation results. Compared to existing deep learning methods, extensive experiments demonstrated that the proposed algorithms outperformed state-of-the-art algorithms in numerical efficiency and effectiveness, especially while applying to the popular networks of DeepLabV3 and IrisParseNet with higher IoU, dice, and compactness metrics on noisy Iris datasets. In particular, the proposed algorithms significantly improve IoU by 20% training on a highly noisy image dataset.
△ Less
Submitted 23 May, 2024;
originally announced June 2024.
-
Improved measurement of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (643 additional authors not shown)
Abstract:
Analyzing $e^+e^-$ collision data corresponding to an integrated luminosity of $7.33~\mathrm{fb}^{-1}$ collected at center-of-mass energies between 4.128 and 4.226~GeV with the BESIII detector, we measure the branching fraction of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$ to be $(2.98\pm0.23\pm0.12)\times10^{-3}$. The $D_s^+\to K^0$ hadronic form factor is determined from the differential dec…
▽ More
Analyzing $e^+e^-$ collision data corresponding to an integrated luminosity of $7.33~\mathrm{fb}^{-1}$ collected at center-of-mass energies between 4.128 and 4.226~GeV with the BESIII detector, we measure the branching fraction of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$ to be $(2.98\pm0.23\pm0.12)\times10^{-3}$. The $D_s^+\to K^0$ hadronic form factor is determined from the differential decay rate of $D^+_s\to K^0 e^+ν_e$ to be $f^{K^0}_+(0)=0.636\pm0.049\pm0.013$. For both measurements, the first uncertainty is statistical and the second systematic. The branching fraction and form factor measurements are factors of 1.6 and 1.7 more precise than the previous world averages, respectively.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model
Authors:
Ling Li,
Yu Ye,
Bingchuan Jiang,
Wei Zeng
Abstract:
This work tackles the problem of geo-localization with a new paradigm using a large vision-language model (LVLM) augmented with human inference knowledge. A primary challenge here is the scarcity of data for training the LVLM - existing street-view datasets often contain numerous low-quality images lacking visual clues, and lack any reasoning inference. To address the data-quality issue, we devise…
▽ More
This work tackles the problem of geo-localization with a new paradigm using a large vision-language model (LVLM) augmented with human inference knowledge. A primary challenge here is the scarcity of data for training the LVLM - existing street-view datasets often contain numerous low-quality images lacking visual clues, and lack any reasoning inference. To address the data-quality issue, we devise a CLIP-based network to quantify the degree of street-view images being locatable, leading to the creation of a new dataset comprising highly locatable street views. To enhance reasoning inference, we integrate external knowledge obtained from real geo-localization games, tap** into valuable human inference capabilities. The data are utilized to train GeoReasoner, which undergoes fine-tuning through dedicated reasoning and location-tuning stages. Qualitative and quantitative evaluations illustrate that GeoReasoner outperforms counterpart LVLMs by more than 25% at country-level and 38% at city-level geo-localization tasks, and surpasses StreetCLIP performance while requiring fewer training resources. The data and code are available at https://github.com/lingli1996/GeoReasoner.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Symbolic Learning Enables Self-Evolving Agents
Authors:
Wangchunshu Zhou,
Yixin Ou,
Shengwei Ding,
Long Li,
Jialong Wu,
Tiannan Wang,
Jiamin Chen,
Shuai Wang,
Xiaohua Xu,
Ningyu Zhang,
Huajun Chen,
Yuchen Eleanor Jiang
Abstract:
The AI community has been exploring a pathway to artificial general intelligence (AGI) by develo** "language agents", which are complex large language models (LLMs) pipelines involving both prompting techniques and tool usage methods. While language agents have demonstrated impressive capabilities for many real-world tasks, a fundamental limitation of current language agents research is that the…
▽ More
The AI community has been exploring a pathway to artificial general intelligence (AGI) by develo** "language agents", which are complex large language models (LLMs) pipelines involving both prompting techniques and tool usage methods. While language agents have demonstrated impressive capabilities for many real-world tasks, a fundamental limitation of current language agents research is that they are model-centric, or engineering-centric. That's to say, the progress on prompts, tools, and pipelines of language agents requires substantial manual engineering efforts from human experts rather than automatically learning from data. We believe the transition from model-centric, or engineering-centric, to data-centric, i.e., the ability of language agents to autonomously learn and evolve in environments, is the key for them to possibly achieve AGI.
In this work, we introduce agent symbolic learning, a systematic framework that enables language agents to optimize themselves on their own in a data-centric way using symbolic optimizers. Specifically, we consider agents as symbolic networks where learnable weights are defined by prompts, tools, and the way they are stacked together. Agent symbolic learning is designed to optimize the symbolic network within language agents by mimicking two fundamental algorithms in connectionist learning: back-propagation and gradient descent. Instead of dealing with numeric weights, agent symbolic learning works with natural language simulacrums of weights, loss, and gradients. We conduct proof-of-concept experiments on both standard benchmarks and complex real-world tasks and show that agent symbolic learning enables language agents to update themselves after being created and deployed in the wild, resulting in "self-evolving agents".
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Measurement of the cross sections of $e^+e^-\to K^{-}\barΞ^{+}Λ/Σ^{0}$ at center-of-mass energies between 3.510 and 4.914 GeV
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (638 additional authors not shown)
Abstract:
Using $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at center-of-mass energies between 3.510 and 4.914GeV, corresponding to an integrated luminosity of 25 fb$^{-1}$, we measure the Born cross sections for the process $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$ at thirty-five energy points with a partial-reconstruction strategy. By fitting the dressed cross sections of…
▽ More
Using $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at center-of-mass energies between 3.510 and 4.914GeV, corresponding to an integrated luminosity of 25 fb$^{-1}$, we measure the Born cross sections for the process $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$ at thirty-five energy points with a partial-reconstruction strategy. By fitting the dressed cross sections of $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$, evidence for $ψ(4160) \to K^{-}\barΞ^{+}Λ$ is found for the first time with a significance of 4.4$σ$, including systematic uncertainties. No evidence for other possible resonances is found. In addition, the products of electronic partial width and branching fraction for all assumed resonances decaying into $K^{-}\barΞ^{+}Λ/Σ^{0}$ are determined.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Ultrafast (10 GHz) mid-IR modulator based on ultra-fast electrical switching of the light-matter coupling
Authors:
Mario Malerba,
Stefano Pirotta,
Guy Aubin,
Luca Lucia,
Mathieu Jeannin,
Jean-Michel Manceau,
Adel Bousseksou,
Quyang Lin,
Jean-Francois Lampin,
Emilien Peytavit,
Stefano Barbieri,
Lianhe Li,
Giles Davies,
Edmund H. Linfield,
Raffaele Colombelli
Abstract:
We demonstrate a free-space amplitude modulator for mid-infrared radiation (lambda=9.6 um) that operates at room temperature up to at least 20 GHz (above the -3dB cutoff frequency measured at 8.2 GHz). The device relies on the ultra-fast transition between weak and strong-coupling regimes induced by the variation of the applied bias voltage. Such transition induces a modulation of the device refle…
▽ More
We demonstrate a free-space amplitude modulator for mid-infrared radiation (lambda=9.6 um) that operates at room temperature up to at least 20 GHz (above the -3dB cutoff frequency measured at 8.2 GHz). The device relies on the ultra-fast transition between weak and strong-coupling regimes induced by the variation of the applied bias voltage. Such transition induces a modulation of the device reflectivity. It is made of a semiconductor heterostructure enclosed in a judiciously designed array of metal-metal optical resonators, that - all-together - behave as an electrically tunable surface. At negative bias, it operates in the weak light-matter coupling regime. Upon application of an appropriate positive bias, the quantum wells populate with electrons and the device transitions to the strong-coupling regime. The modulator transmission keeps linear with input RF power in the 0dBm - 9dBm range. The increase of optical powers up to 25 mW exhibit a weak beginning saturation a little bit below.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Measurements of $K_S^0$-$K_L^0$ asymmetries in the decays $Λ_c^+ \to pK_{L,S}^0$, $pK_{L,S}^0π^+π^-$ and $pK_{L,S}^0π^0$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (643 additional authors not shown)
Abstract:
Using $e^+e^-$ annihilation data sets corresponding to an integrated luminosity of 4.5 $\text{fb}^{-1}$, collected with the BESIII detector at center-of-mass energies between 4.600 and 4.699 GeV, we report the first measurements of the absolute branching fractions $\mathcal{B}(Λ_c^+\to pK_{L}^{0})=(1.67 \pm 0.06 \pm 0. 04)\%$, $\mathcal{B}(Λ_c^+\to pK_{L}^{0}π^+π^-)=(1.69 \pm 0.10 \pm 0.05)\%$, an…
▽ More
Using $e^+e^-$ annihilation data sets corresponding to an integrated luminosity of 4.5 $\text{fb}^{-1}$, collected with the BESIII detector at center-of-mass energies between 4.600 and 4.699 GeV, we report the first measurements of the absolute branching fractions $\mathcal{B}(Λ_c^+\to pK_{L}^{0})=(1.67 \pm 0.06 \pm 0. 04)\%$, $\mathcal{B}(Λ_c^+\to pK_{L}^{0}π^+π^-)=(1.69 \pm 0.10 \pm 0.05)\%$, and $\mathcal{B}(Λ_c^+\to pK_{L}^{0}π^0)=(2.02 \pm 0.13 \pm 0.05)\%$, where the first uncertainties are statistical and the second systematic. Combining with the known branching fractions of $Λ_c^+ \to pK_{S}^{0}$, $Λ_c^+ \to pK_{S}^{0}π^+π^-$, and $Λ_c^+ \to pK_{S}^{0}π^0$, we present the first measurements of the $K_{S}^{0}$-$K_{L}^{0}$ asymmetries $R(Λ_c^+, K_{S,L}^0X) = \frac{\mathcal{B}(Λ_c^+ \to K_{S}^{0} X) - \mathcal{B}(Λ_c^+ \to K_{L}^{0} X)}{\mathcal{B}(Λ_c^+ \to K_{S}^{0} X) + \mathcal{B}(Λ_c^+ \to K_{L}^{0} X)}$ in charmed baryon decays: $R(Λ_c^+, pK_{S,L}^0) = -0.025 \pm 0.031$, $R(Λ_c^+, pK_{S,L}^0π^+π^-) = -0.027 \pm 0.048$, and $R(Λ_c^+, pK_{S,L}^0π^0) =-0.015 \pm 0.046$. No significant asymmetries within the uncertainties are observed.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Theoretical insights into charge transfer plasmon lifetime
Authors:
Alemayehu Nana Koya,
Longnan Li,
Wei Li
Abstract:
Understanding the spectral and temporal dynamics of charge transfer plasmon resonances that emerge in conductively connected plasmonic nanoparticles is crucial for exploiting their potentials for enhanced infrared spectroscopy and optical computing. In this article, we present a theoretical study based on classical electromagnetism to describe the spectral signature and dephasing time of charge tr…
▽ More
Understanding the spectral and temporal dynamics of charge transfer plasmon resonances that emerge in conductively connected plasmonic nanoparticles is crucial for exploiting their potentials for enhanced infrared spectroscopy and optical computing. In this article, we present a theoretical study based on classical electromagnetism to describe the spectral signature and dephasing time of charge transfer plasmons. By fitting the scattering curves and near-field amplitude oscillations, we determine the spectral linewidth and lifetime of charge transfer plasmons in conductively connected gold nanodisk dimers. We find that, compared with the well-known particle plasmons and dimer plasmons, charge transfer plasmons have a longer lifetime, which can be further extended by manipulating the geometric parameters of nanojunction and nanoparticles. Moreover, quantitative analyses of the optical near-field amplitude reveal that charge transfer plasmon modes oscillate completely out of phase with particle plasmon and dimer plasmon modes. The dephasing time and charge transfer rate are found to be on a few femtosecond timescale, implying that conductively connected plasmonic nanoparticles hold great promise as channels for coherent transfer of energy and information in future all-optical computing devices.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Retrieval-style In-Context Learning for Few-shot Hierarchical Text Classification
Authors:
Huiyao Chen,
Yu Zhao,
Zulong Chen,
Mengjia Wang,
Liangyue Li,
Meishan Zhang,
Min Zhang
Abstract:
Hierarchical text classification (HTC) is an important task with broad applications, while few-shot HTC has gained increasing interest recently. While in-context learning (ICL) with large language models (LLMs) has achieved significant success in few-shot learning, it is not as effective for HTC because of the expansive hierarchical label sets and extremely-ambiguous labels. In this work, we intro…
▽ More
Hierarchical text classification (HTC) is an important task with broad applications, while few-shot HTC has gained increasing interest recently. While in-context learning (ICL) with large language models (LLMs) has achieved significant success in few-shot learning, it is not as effective for HTC because of the expansive hierarchical label sets and extremely-ambiguous labels. In this work, we introduce the first ICL-based framework with LLM for few-shot HTC. We exploit a retrieval database to identify relevant demonstrations, and an iterative policy to manage multi-layer hierarchical labels. Particularly, we equip the retrieval database with HTC label-aware representations for the input texts, which is achieved by continual training on a pretrained language model with masked language modeling (MLM), layer-wise classification (CLS, specifically for HTC), and a novel divergent contrastive learning (DCL, mainly for adjacent semantically-similar labels) objective. Experimental results on three benchmark datasets demonstrate superior performance of our method, and we can achieve state-of-the-art results in few-shot HTC.
△ Less
Submitted 29 June, 2024; v1 submitted 25 June, 2024;
originally announced June 2024.
-
LumberChunker: Long-Form Narrative Document Segmentation
Authors:
André V. Duarte,
João Marques,
Miguel Graça,
Miguel Freire,
Lei Li,
Arlindo L. Oliveira
Abstract:
Modern NLP tasks increasingly rely on dense retrieval methods to access up-to-date and relevant contextual information. We are motivated by the premise that retrieval benefits from segments that can vary in size such that a content's semantic independence is better captured. We propose LumberChunker, a method leveraging an LLM to dynamically segment documents, which iteratively prompts the LLM to…
▽ More
Modern NLP tasks increasingly rely on dense retrieval methods to access up-to-date and relevant contextual information. We are motivated by the premise that retrieval benefits from segments that can vary in size such that a content's semantic independence is better captured. We propose LumberChunker, a method leveraging an LLM to dynamically segment documents, which iteratively prompts the LLM to identify the point within a group of sequential passages where the content begins to shift. To evaluate our method, we introduce GutenQA, a benchmark with 3000 "needle in a haystack" type of question-answer pairs derived from 100 public domain narrative books available on Project Gutenberg. Our experiments show that LumberChunker not only outperforms the most competitive baseline by 7.37% in retrieval performance (DCG@20) but also that, when integrated into a RAG pipeline, LumberChunker proves to be more effective than other chunking methods and competitive baselines, such as the Gemini 1.5M Pro. Our Code and Data are available at https://github.com/joaodsmarques/LumberChunker
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Study of the $f_{0}(980)$ through the decay $D_{s}^{+}\rightarrow π^{+}π^{+}π^{-}π^{0}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (649 additional authors not shown)
Abstract:
We perform the first amplitude analysis of $D^+_s \to π^+π^+π^-π^0$ decays, based on data samples of electron-positron collisions recorded with the BESIII detector at center-of-mass energies between 4.128 and 4.226 GeV, corresponding to an integrated luminosity of 7.33~fb$^{-1}$. We report the observation of $D_{s}^{+} \to f_0(980)ρ(770)^{+}$ with a statistical significance greater than 10$σ$ and…
▽ More
We perform the first amplitude analysis of $D^+_s \to π^+π^+π^-π^0$ decays, based on data samples of electron-positron collisions recorded with the BESIII detector at center-of-mass energies between 4.128 and 4.226 GeV, corresponding to an integrated luminosity of 7.33~fb$^{-1}$. We report the observation of $D_{s}^{+} \to f_0(980)ρ(770)^{+}$ with a statistical significance greater than 10$σ$ and determine the branching fractions $\mathcal{B}(D_s^+\toπ^+π^+π^-π^0|_{{\rm non}-η})=(2.04\pm0.08_{\rm stat.}\pm0.05_{\rm syst.})\%$ and $\mathcal{B}(D_s^+\toηπ^+)=(1.56\pm0.09_{\rm stat.}\pm0.04_{\rm syst.})\%$. Moreover, we measure the relative branching fraction between $φ\toπ^+π^-π^0$ and $φ\to K^+K^-$ to be $\frac{\mathcal{B}(φ(1020) \to π^+π^-π^0)}{\mathcal{B}(φ(1020) \to K^+K^-)}=0.230 \pm 0.014_{\rm stat.} \pm 0.010_{\rm syst.}$, which deviates from the world average value by more than $4σ$.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Probing the nature of the $χ_{c1}(3872)$ state using radiative decays
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1094 additional authors not shown)
Abstract:
The radiative decays $χ_{c1}(3872)\rightarrowψ(2S)γ$ and $χ_{c1}(3872)\rightarrow J/ψγ$ are used to probe the~nature of the~$χ_{c1}(3872)$ state using proton-proton collision data collected with the LHCb detector, corresponding to an~integrated luminosity of~9fb$^{-1}$. Using the~$B^+\rightarrow χ_{c1}(3872)K^+$decay, the $χ_{c1}(3872)\rightarrow ψ(2S)γ$ process is observed for the first time and…
▽ More
The radiative decays $χ_{c1}(3872)\rightarrowψ(2S)γ$ and $χ_{c1}(3872)\rightarrow J/ψγ$ are used to probe the~nature of the~$χ_{c1}(3872)$ state using proton-proton collision data collected with the LHCb detector, corresponding to an~integrated luminosity of~9fb$^{-1}$. Using the~$B^+\rightarrow χ_{c1}(3872)K^+$decay, the $χ_{c1}(3872)\rightarrow ψ(2S)γ$ process is observed for the first time and the ratio of its partial width to that of the $χ_{c1}(3872)\rightarrow J/ψγ$ decay is measured to be $$ \frac{Γ_{χ_{c1}(3872)\rightarrow ψ(2S)γ}}
{Γ_{χ_{c1}(3872)\rightarrow J/ψγ}} = 1.67 \pm 0.21 \pm 0.12 \pm0.04 , $$ where the first uncertainty is statistical, the second systematic and the third is due to the uncertainties on the branching fractions of the $ψ(2S)$ and $J/ψ$ mesons. The measured ratio makes the interpretation of the $χ_{c1}(3872)$ state as a~pure $D^0\bar{D}^{*0}+\bar{D}^0D^{*0}$ molecule questionable and strongly indicates a sizeable compact charmonium or tetraquark component within the $χ_{c1}(3872)$ state.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
The $L^p$ Poisson-Neumann problem and its relation to the Neumann problem
Authors:
Joseph Feneuil,
Linhan Li
Abstract:
We introduce the $L^p$ Poisson-Neumann problem for an uniformly elliptic operator $L=-\rm{div }A\nabla$ in divergence form in a bounded 1-sided Chord Arc Domain $Ω$, which considers solutions to $Lu=h-\rm{div}\vec{F}$ in $Ω$ with zero Neumann data on the boundary for $h$ and $\vec F$ in some tent spaces. We give different characterizations of solvability of the $L^p$ Poisson-Neumann problem and it…
▽ More
We introduce the $L^p$ Poisson-Neumann problem for an uniformly elliptic operator $L=-\rm{div }A\nabla$ in divergence form in a bounded 1-sided Chord Arc Domain $Ω$, which considers solutions to $Lu=h-\rm{div}\vec{F}$ in $Ω$ with zero Neumann data on the boundary for $h$ and $\vec F$ in some tent spaces. We give different characterizations of solvability of the $L^p$ Poisson-Neumann problem and its weaker variants, and in particular, we show that solvability of the weak $L^p$ Poisson-Neumann probelm is equivalent to a weak reverse Hölder inequality. We show that the Poisson-Neumman problem is closely related to the $L^p$ Neumann problem, whose solvability is a long-standing open problem. We are able to improve the extrapolation of the $L^p$ Neumann problem from Kenig and Pipher by obtaining an extrapolation result on the Poisson-Neumann problem.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Chain-of-Probe: Examing the Necessity and Accuracy of CoT Step-by-Step
Authors:
Zezhong Wang,
Xingshan Zeng,
Weiwen Liu,
Yufei Wang,
Liangyou Li,
Yasheng Wang,
Lifeng Shang,
Xin Jiang,
Qun Liu,
Kam-Fai Wong
Abstract:
Current research found the issue of Early Answering in large language models (LLMs), where the models already have an answer before generating the Chain-of-Thought (CoT). This phenomenon suggests a potential lack of necessary dependency between the predicted answer and the reasoning process. Consequently, two important questions arise: (1) Is CoT still necessary if the model already has an answer?…
▽ More
Current research found the issue of Early Answering in large language models (LLMs), where the models already have an answer before generating the Chain-of-Thought (CoT). This phenomenon suggests a potential lack of necessary dependency between the predicted answer and the reasoning process. Consequently, two important questions arise: (1) Is CoT still necessary if the model already has an answer? (2) Can the correctness of the answer serve as valid evidence for the correctness of CoT? To address these questions, we propose a method, namely Chain-of-Probe (CoP), to probe changes in the mind during the model's reasoning. The probing results show that in a significant number of question-answer cases, CoT appears to be unnecessary, and this necessity correlates with the simplicity of the task, defined by reasoning steps required. Furthermore, by analyzing patterns in mind change, we examine the correctness of the model's reasoning. Our validation reveals that many responses, although correct in their final answer, contain errors in their reasoning process. To this end, we propose a strategic approach based on CoP to prioritize answers with correct reasoning among multiple candidates, thereby bolstering the reliability of the model's reasoning.
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
Circuit Complexity of Sparse Quantum State Preparation
Authors:
**gquan Luo,
Lvzhou Li
Abstract:
Quantum state preparation is a fundamental and significant subroutine in quantum computing. In this paper, we conduct a systematic investigation on the circuit size for sparse quantum state preparation. A quantum state is said to be $d$-sparse if it has only $d$ non-zero amplitudes. For the task of preparing $n$-qubit $d$-sparse quantum states, we obtain the following results:
(a) We propose the…
▽ More
Quantum state preparation is a fundamental and significant subroutine in quantum computing. In this paper, we conduct a systematic investigation on the circuit size for sparse quantum state preparation. A quantum state is said to be $d$-sparse if it has only $d$ non-zero amplitudes. For the task of preparing $n$-qubit $d$-sparse quantum states, we obtain the following results:
(a) We propose the first approach that uses $o(dn)$ elementary gates without using ancillary qubits. Specifically, it is proven that any $n$-qubit $d$-sparse quantum state can be prepared by a quantum circuit of size $O(\frac{dn}{\log n} + n)$ without using ancillary qubits. This is asymptotically optimal when $d = poly(n)$, and this optimality extends to a broader scope under some reasonable assumptions.
(b) We show that any $n$-qubit $d$-sparse quantum state can be prepared by a quantum circuit of size $O(\frac{dn}{\log d})$ and depth $Θ(\log dn)$ using at most $O(\frac{n{d}}{\log d} )$ ancillary qubits, which not only reduces the circuit size compared to the one without ancillary qubits when $d = ω(poly(n))$, but also achieves the same asymptotically optimal depth while utilizing fewer ancillary qubits and applying fewer quantum gates compared to the result given in [PRL, 129, 230504(2022)]. (ii) We establish the lower bound $Ω(\frac{dn}{\log(n + m) + \log d} + n)$ on the circuit size with $m$ ancillary qubits available. we also obtain a slightly stronger lower bound under reasonable assumptions.
(c) We prove that with arbitrary amount of ancillary qubits available, the circuit size for preparing $n$-qubit $d$-sparse quantum states is $Θ({\frac{dn}{\log dn} + n})$.
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
Search for charmed baryons in the $Λ_c^+η$ system and measurement of the branching fractions of $Λ_c(2880)^+$ and $Λ_c(2940)^+$ decaying to $Λ_c^+η$ and $pD^0$ relative to $Σ_c(2455)π$
Authors:
Belle Collaboration,
S. X. Li,
C. P. Shen,
I. Adachi,
J. K. Ahn,
H. Aihara,
D. M. Asner,
H. Atmacan,
T. Aushev,
R. Ayad,
Sw. Banerjee,
K. Belous,
J. Bennett,
M. Bessner,
T. Bilka,
D. Biswas,
D. Bodrov,
A. Bozek,
M. Bračko,
P. Branchini,
T. E. Browder,
A. Budano,
M. Campajola,
M. -C. Chang,
B. G. Cheon
, et al. (102 additional authors not shown)
Abstract:
We search for excited charmed baryons in the $Λ_c^+η$ system using a data sample corresponding to an integrated luminosity of 980 $\rm fb^{-1}$. The data were collected by the Belle detector at the KEKB $e^{+}$$e^{-}$ asymmetric-energy collider. No significant signals are found in the $Λ_c^+η$ mass spectrum, including the known $Λ_c(2880)^+$ and $Λ_c(2940)^+$. Clear $Λ_c(2880)^+$ and…
▽ More
We search for excited charmed baryons in the $Λ_c^+η$ system using a data sample corresponding to an integrated luminosity of 980 $\rm fb^{-1}$. The data were collected by the Belle detector at the KEKB $e^{+}$$e^{-}$ asymmetric-energy collider. No significant signals are found in the $Λ_c^+η$ mass spectrum, including the known $Λ_c(2880)^+$ and $Λ_c(2940)^+$. Clear $Λ_c(2880)^+$ and $Λ_c(2940)^+$ signals are observed in the $pD^0$ mass spectrum. We set upper limits at 90\% credibility level on ratios of branching fractions of $Λ_c(2880)^+$ and $Λ_c(2940)^+$ decaying to $Λ_c^+η$ relative to $Σ_c(2455)π$ of $<0.13$ for the $Λ_c(2880)^+$ and $<1.11$ for the $Λ_c(2940)^+$. We measure ratios of branching fractions of $Λ_c(2880)^+$ and $Λ_c(2940)^+$ decaying to $pD^0$ relative to $Σ_c(2455)π$ of $0.75 \pm 0.03(\text{stat.}) \pm 0.07(\text{syst.})$ for the $Λ_c(2880)^+$ and $3.59 \pm 0.21(\text{stat.}) \pm 0.56(\text{syst.})$ for the $Λ_c(2940)^+$.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
Alternating-Chiral Charge Density Waves and Hybrid Ferrimagnetism in Monolayered NbTe2
Authors:
Yusong Bai,
Guohua Cao,
**ghao Deng,
Haomin Fei,
Xiaoyu Lin,
Leiqiang Li,
Chao Zhu,
Zemin Pan,
Tao Jian,
Da Huo,
Zhengbo Cheng,
Chih-Kang Shih,
** Cui,
Chendong Zhang,
Zhenyu Zhang
Abstract:
Intertwining of different quantum degrees of freedom manifests exotic quantum phenomena in many-body systems, especially in reduced dimensionality. Here we show that monolayered NbTe2 serves as an ideal platform where lattice, charge, and spin degrees of freedom manifest cooperatively, leading to a new and threading order of chirality. By using spin-polarized scanning tunneling microscopy/spectros…
▽ More
Intertwining of different quantum degrees of freedom manifests exotic quantum phenomena in many-body systems, especially in reduced dimensionality. Here we show that monolayered NbTe2 serves as an ideal platform where lattice, charge, and spin degrees of freedom manifest cooperatively, leading to a new and threading order of chirality. By using spin-polarized scanning tunneling microscopy/spectroscopy, we reveal that the root19 * root19 phase of NbTe2 is encoded with both alternating-chiral atomic displacements and charge density waves, characterized by two chiral units of opposite handedness within the reconstructed cell. We show unambiguous evidence for emergent spin polarizations spreading over the primitive cell, with the magnetization orientation synchronized with alternating handedness of chiral order. Our first-principles studies identify the origin of intertwined orders being correlation driven, with the threading order of chirality emerging when the on-site Coulomb repulsion exceeds a critical value. The spin ordering is further shown to be of hybrid ferrimagnetic nature, contributed by the itinerant electrons and localized d-orbitals. Collectively, these findings expand the realm of chiral order in correlated electron systems, and facilitate an appealing platform for chiral spintronic and related applications.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
Search for the $e^+e^- \to φχ_{c1}(3872)$ process at BESIII
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (639 additional authors not shown)
Abstract:
Based on 368.5 pb$^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies 4.914 and 4.946 GeV by the BESIII detector, the $e^+e^- \to φχ_{c1}(3872)$ process is searched for the first time. No significant signal is observed and the upper limits at the 90\% confidence level on the product of the Born cross section $σ(e^+e^- \to φχ_{c1}(3872))$ and the branching fraction…
▽ More
Based on 368.5 pb$^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies 4.914 and 4.946 GeV by the BESIII detector, the $e^+e^- \to φχ_{c1}(3872)$ process is searched for the first time. No significant signal is observed and the upper limits at the 90\% confidence level on the product of the Born cross section $σ(e^+e^- \to φχ_{c1}(3872))$ and the branching fraction $\mathcal{B}[χ_{c1}(3872)\toπ^+π^- J/ψ]$ at 4.914 and 4.946 GeV are set to be 0.85 and 0.96 pb, respectively. These measurements provide useful information for the production of the $χ_{c1}(3872)$ at $e^+e^-$ collider and deepen our understanding about the nature of this particle.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Unveiling the Impact of Multi-Modal Interactions on User Engagement: A Comprehensive Evaluation in AI-driven Conversations
Authors:
Lichao Zhang,
Jia Yu,
Shuai Zhang,
Long Li,
Yangyang Zhong,
Guanbao Liang,
Yuming Yan,
Qing Ma,
Fangsheng Weng,
Fayu Pan,
**g Li,
Renjun Xu,
Zhenzhong Lan
Abstract:
Large Language Models (LLMs) have significantly advanced user-bot interactions, enabling more complex and coherent dialogues. However, the prevalent text-only modality might not fully exploit the potential for effective user engagement. This paper explores the impact of multi-modal interactions, which incorporate images and audio alongside text, on user engagement in chatbot conversations. We cond…
▽ More
Large Language Models (LLMs) have significantly advanced user-bot interactions, enabling more complex and coherent dialogues. However, the prevalent text-only modality might not fully exploit the potential for effective user engagement. This paper explores the impact of multi-modal interactions, which incorporate images and audio alongside text, on user engagement in chatbot conversations. We conduct a comprehensive analysis using a diverse set of chatbots and real-user interaction data, employing metrics such as retention rate and conversation length to evaluate user engagement. Our findings reveal a significant enhancement in user engagement with multi-modal interactions compared to text-only dialogues. Notably, the incorporation of a third modality significantly amplifies engagement beyond the benefits observed with just two modalities. These results suggest that multi-modal interactions optimize cognitive processing and facilitate richer information comprehension. This study underscores the importance of multi-modality in chatbot design, offering valuable insights for creating more engaging and immersive AI communication experiences and informing the broader AI community about the benefits of multi-modal interactions in enhancing user engagement.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
ESC-Eval: Evaluating Emotion Support Conversations in Large Language Models
Authors:
Haiquan Zhao,
Lingyu Li,
Shisong Chen,
Shuqi Kong,
Jiaan Wang,
Kexin Huang,
Tianle Gu,
Yixu Wang,
Dandan Liang,
Zhixu Li,
Yan Teng,
Yanghua Xiao,
Yingchun Wang
Abstract:
Emotion Support Conversation (ESC) is a crucial application, which aims to reduce human stress, offer emotional guidance, and ultimately enhance human mental and physical well-being. With the advancement of Large Language Models (LLMs), many researchers have employed LLMs as the ESC models. However, the evaluation of these LLM-based ESCs remains uncertain. Inspired by the awesome development of ro…
▽ More
Emotion Support Conversation (ESC) is a crucial application, which aims to reduce human stress, offer emotional guidance, and ultimately enhance human mental and physical well-being. With the advancement of Large Language Models (LLMs), many researchers have employed LLMs as the ESC models. However, the evaluation of these LLM-based ESCs remains uncertain. Inspired by the awesome development of role-playing agents, we propose an ESC Evaluation framework (ESC-Eval), which uses a role-playing agent to interact with ESC models, followed by a manual evaluation of the interactive dialogues. In detail, we first re-organize 2,801 role-playing cards from seven existing datasets to define the roles of the role-playing agent. Second, we train a specific role-playing model called ESC-Role which behaves more like a confused person than GPT-4. Third, through ESC-Role and organized role cards, we systematically conduct experiments using 14 LLMs as the ESC models, including general AI-assistant LLMs (ChatGPT) and ESC-oriented LLMs (ExTES-Llama). We conduct comprehensive human annotations on interactive multi-turn dialogues of different ESC models. The results show that ESC-oriented LLMs exhibit superior ESC abilities compared to general AI-assistant LLMs, but there is still a gap behind human performance. Moreover, to automate the scoring process for future ESC models, we developed ESC-RANK, which trained on the annotated data, achieving a scoring performance surpassing 35 points of GPT-4. Our data and code are available at https://github.com/haidequanbu/ESC-Eval.
△ Less
Submitted 24 June, 2024; v1 submitted 21 June, 2024;
originally announced June 2024.
-
Learning to Cover: Online Learning and Optimization with Irreversible Decisions
Authors:
Alexandre Jacquillat,
Michael Lingzhi Li
Abstract:
We define an online learning and optimization problem with irreversible decisions contributing toward a coverage target. At each period, a decision-maker selects facilities to open, receives information on the success of each one, and updates a machine learning model to guide future decisions. The goal is to minimize costs across a finite horizon under a chance constraint reflecting the coverage t…
▽ More
We define an online learning and optimization problem with irreversible decisions contributing toward a coverage target. At each period, a decision-maker selects facilities to open, receives information on the success of each one, and updates a machine learning model to guide future decisions. The goal is to minimize costs across a finite horizon under a chance constraint reflecting the coverage target. We derive an optimal algorithm and a tight lower bound in an asymptotic regime characterized by a large target number of facilities $m\to\infty$ but a finite horizon $T\in\mathbb{Z}_+$. We find that the regret grows sub-linearly at a rate $Θ\left(m^{\frac{1}{2}\cdot\frac{1}{1-2^{-T}}}\right)$, thus converging exponentially fast to $Θ(\sqrt{m})$. We establish the robustness of this result to the learning environment; we also extend it to a more complicated facility location setting in a bipartite facility-customer graph with a target on customer coverage. Throughout, constructive proofs identify a policy featuring limited exploration initially for learning purposes, and fast exploitation later on for optimization purposes once uncertainty gets mitigated. These findings underscore the benefits of limited online learning and optimization, in that even a few rounds can provide significant benefits as compared to a no-learning baseline.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Jailbreaking as a Reward Misspecification Problem
Authors:
Zhihui Xie,
Jiahui Gao,
Lei Li,
Zhenguo Li,
Qi Liu,
Lingpeng Kong
Abstract:
The widespread adoption of large language models (LLMs) has raised concerns about their safety and reliability, particularly regarding their vulnerability to adversarial attacks. In this paper, we propose a novel perspective that attributes this vulnerability to reward misspecification during the alignment process. We introduce a metric ReGap to quantify the extent of reward misspecification and d…
▽ More
The widespread adoption of large language models (LLMs) has raised concerns about their safety and reliability, particularly regarding their vulnerability to adversarial attacks. In this paper, we propose a novel perspective that attributes this vulnerability to reward misspecification during the alignment process. We introduce a metric ReGap to quantify the extent of reward misspecification and demonstrate its effectiveness and robustness in detecting harmful backdoor prompts. Building upon these insights, we present ReMiss, a system for automated red teaming that generates adversarial prompts against various target aligned LLMs. ReMiss achieves state-of-the-art attack success rates on the AdvBench benchmark while preserving the human readability of the generated prompts. Detailed analysis highlights the unique advantages brought by the proposed reward misspecification objective compared to previous methods.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Failure-Resilient Distributed Inference with Model Compression over Heterogeneous Edge Devices
Authors:
Li Wang,
Liang Li,
Lianming Xu,
Xian Peng,
Aiguo Fei
Abstract:
The distributed inference paradigm enables the computation workload to be distributed across multiple devices, facilitating the implementations of deep learning based intelligent services on extremely resource-constrained Internet of Things (IoT) scenarios. Yet it raises great challenges to perform complicated inference tasks relying on a cluster of IoT devices that are heterogeneous in their comp…
▽ More
The distributed inference paradigm enables the computation workload to be distributed across multiple devices, facilitating the implementations of deep learning based intelligent services on extremely resource-constrained Internet of Things (IoT) scenarios. Yet it raises great challenges to perform complicated inference tasks relying on a cluster of IoT devices that are heterogeneous in their computing/communication capacity and prone to crash or timeout failures. In this paper, we present RoCoIn, a robust cooperative inference mechanism for locally distributed execution of deep neural network-based inference tasks over heterogeneous edge devices. It creates a set of independent and compact student models that are learned from a large model using knowledge distillation for distributed deployment. In particular, the devices are strategically grouped to redundantly deploy and execute the same student model such that the inference process is resilient to any local failures, while a joint knowledge partition and student model assignment scheme are designed to minimize the response latency of the distributed inference system in the presence of devices with diverse capacities. Extensive simulations are conducted to corroborate the superior performance of our RoCoIn for distributed inference compared to several baselines, and the results demonstrate its efficacy in timely inference and failure resiliency.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Prediction and Reference Quality Adaptation for Learned Video Compression
Authors:
Xihua Sheng,
Li Li,
Dong Liu,
Houqiang Li
Abstract:
Temporal prediction is one of the most important technologies for video compression. Various prediction coding modes are designed in traditional video codecs. Traditional video codecs will adaptively to decide the optimal coding mode according to the prediction quality and reference quality. Recently, learned video codecs have made great progress. However, they ignore the prediction and reference…
▽ More
Temporal prediction is one of the most important technologies for video compression. Various prediction coding modes are designed in traditional video codecs. Traditional video codecs will adaptively to decide the optimal coding mode according to the prediction quality and reference quality. Recently, learned video codecs have made great progress. However, they ignore the prediction and reference quality adaptation, which leads to incorrect utilization of temporal prediction and reconstruction error propagation. Therefore, in this paper, we first propose a confidence-based prediction quality adaptation (PQA) module to provide explicit discrimination for the spatial and channel-wise prediction quality difference. With this module, the prediction with low quality will be suppressed and that with high quality will be enhanced. The codec can adaptively decide which spatial or channel location of predictions to use. Then, we further propose a reference quality adaptation (RQA) module and an associated repeat-long training strategy to provide dynamic spatially variant filters for diverse reference qualities. With the filters, it is easier for our codec to achieve the target reconstruction quality according to reference qualities, thus reducing the propagation of reconstruction errors. Experimental results show that our codec obtains higher compression performance than the reference software of H.266/VVC and the previous state-of-the-art learned video codecs in both RGB and YUV420 colorspaces.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
A microwave photonic prototype for concurrent radar detection and spectrum sensing over an 8 to 40 GHz bandwidth
Authors:
Taixia Shi,
Dingding Liang,
Lu Wang,
Lin Li,
Shaogang Guo,
Jiawei Gao,
Xiaowei Li,
Chulun Lin,
Lei Shi,
Baogang Ding,
Shiyang Liu,
Fangyi Yang,
Chi Jiang,
Yang Chen
Abstract:
In this work, a microwave photonic prototype for concurrent radar detection and spectrum sensing is proposed, designed, built, and investigated. A direct digital synthesizer and an analog electronic circuit are integrated to generate an intermediate frequency (IF) linearly frequency-modulated (LFM) signal with a tunable center frequency from 2.5 to 9.5 GHz and an instantaneous bandwidth of 1 GHz.…
▽ More
In this work, a microwave photonic prototype for concurrent radar detection and spectrum sensing is proposed, designed, built, and investigated. A direct digital synthesizer and an analog electronic circuit are integrated to generate an intermediate frequency (IF) linearly frequency-modulated (LFM) signal with a tunable center frequency from 2.5 to 9.5 GHz and an instantaneous bandwidth of 1 GHz. The IF LFM signal is converted to the optical domain via an intensity modulator and then filtered by a fiber Bragg grating (FBG) to generate only two 2nd-order optical LFM sidebands. In radar detection, the two optical LFM sidebands beat with each other to generate a frequency-and-bandwidth-quadrupled LFM signal, which is used for ranging, radial velocity measurement, and imaging. By changing the center frequency of the IF LFM signal, the radar function can be operated within 8 to 40 GHz. In spectrum sensing, one 2nd-order optical LFM sideband is selected by another FBG, which then works in conjunction with the stimulated Brillouin scattering gain spectrum to map the frequency of the signal under test to time with an instantaneous measurement bandwidth of 2 GHz. By using a frequency shift module to adjust the pump frequency, the frequency measurement range can be adjusted from 0 to 40 GHz. The prototype is comprehensively studied and tested, which is capable of achieving a range resolution of 3.75 cm, a range error of less than $\pm$ 2 cm, a radial velocity error within $\pm$ 1 cm/s, delivering clear imaging of multiple small targets, and maintaining a frequency measurement error of less than $\pm$ 7 MHz and a frequency resolution of better than 20 MHz.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
EAGER: Two-Stream Generative Recommender with Behavior-Semantic Collaboration
Authors:
Ye Wang,
Jiahao Xun,
Mingjie Hong,
Jieming Zhu,
Tao **,
Wang Lin,
Haoyuan Li,
Linjun Li,
Yan Xia,
Zhou Zhao,
Zhenhua Dong
Abstract:
Generative retrieval has recently emerged as a promising approach to sequential recommendation, framing candidate item retrieval as an autoregressive sequence generation problem. However, existing generative methods typically focus solely on either behavioral or semantic aspects of item information, neglecting their complementary nature and thus resulting in limited effectiveness. To address this…
▽ More
Generative retrieval has recently emerged as a promising approach to sequential recommendation, framing candidate item retrieval as an autoregressive sequence generation problem. However, existing generative methods typically focus solely on either behavioral or semantic aspects of item information, neglecting their complementary nature and thus resulting in limited effectiveness. To address this limitation, we introduce EAGER, a novel generative recommendation framework that seamlessly integrates both behavioral and semantic information. Specifically, we identify three key challenges in combining these two types of information: a unified generative architecture capable of handling two feature types, ensuring sufficient and independent learning for each type, and fostering subtle interactions that enhance collaborative information utilization. To achieve these goals, we propose (1) a two-stream generation architecture leveraging a shared encoder and two separate decoders to decode behavior tokens and semantic tokens with a confidence-based ranking strategy; (2) a global contrastive task with summary tokens to achieve discriminative decoding for each type of information; and (3) a semantic-guided transfer task designed to implicitly promote cross-interactions through reconstruction and estimation objectives. We validate the effectiveness of EAGER on four public benchmarks, demonstrating its superior performance compared to existing methods.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Novae: An Important Source of Lithium in the Galaxy
Authors:
Jun Gao,
Chunhua Zhu,
Guoliang Lü,
**long Yu,
Lin Li,
Helei Liu,
Sufen Guo
Abstract:
The source of the Galactic Lithium (Li) has long been a puzzle. With the discovery of Li in novae, extensive research has been conducted. However, there still exists a significant disparity between the observed abundance of lithium in novae and the existing theoretical predictions. Using the Modules for Experiments in Stellar Astrophysics (MESA), we simulate the evolution of nova with element diff…
▽ More
The source of the Galactic Lithium (Li) has long been a puzzle. With the discovery of Li in novae, extensive research has been conducted. However, there still exists a significant disparity between the observed abundance of lithium in novae and the existing theoretical predictions. Using the Modules for Experiments in Stellar Astrophysics (MESA), we simulate the evolution of nova with element diffusion and appropriately increased the amount of 3^He in the mixtures. Element diffusion enhances the transport efficiency between the nuclear reaction zone and the convective region on the surface of the white dwarf during nova eruptions, which results in more 7^Be to be transmitted to the white dwarf surface and ultimately ejected. Compared to the previous predictions, the abundance of 7^Be in novae simulated in our model significantly increases. And the result is able to explain almost all observed novae. Using the method of population synthesis, we calculate Li yield in the Galaxy. We find that the Galactic occurrence rate of nova is about 130 yr^{-1}, and about 110M Li produced by nova eruption is ejected into the interstellar medium (ISM). About 73\% of Li in the Galactic ISM originates from novae, and approximately 15\%-20\% of the entire Galaxy. It means that novae are the important source of Li in the Galactic.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Improved Remixing Process for Domain Adaptation-Based Speech Enhancement by Mitigating Data Imbalance in Signal-to-Noise Ratio
Authors:
Li Li,
Shogo Seki
Abstract:
RemixIT and Remixed2Remixed are domain adaptation-based speech enhancement (DASE) methods that use a teacher model trained in full supervision to generate pseudo-paired data by remixing the outputs of the teacher model. The student model for enhancing real-world recorded signals is trained using the pseudo-paired data without ground truth. Since the noisy signals are recorded in natural environmen…
▽ More
RemixIT and Remixed2Remixed are domain adaptation-based speech enhancement (DASE) methods that use a teacher model trained in full supervision to generate pseudo-paired data by remixing the outputs of the teacher model. The student model for enhancing real-world recorded signals is trained using the pseudo-paired data without ground truth. Since the noisy signals are recorded in natural environments, the dataset inevitably suffers data imbalance in some acoustic properties, leading to subpar performance for the underrepresented data. The signal-to-noise ratio (SNR), inherently balanced in supervised learning, is a prime example. In this paper, we provide empirical evidence that the SNR of pseudo data has a significant impact on model performance using the dataset of the CHiME-7 UDASE task, highlighting the importance of balanced SNR in DASE. Furthermore, we propose adopting curriculum learning to encompass a broad range of SNRs to boost performance for underrepresented data.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Quantum analog to flap** of flags: interface instability for co-flow binary superfluids
Authors:
Yu** An,
Li Li,
Huabi Zeng
Abstract:
We study the interface dynamics in immiscible binary superfluids using its holographic description, which naturally consists of an inviscid superfluid component and a viscous normal fluid component. We give the first theoretical realization of interface instability for two superfluid components moving with identical velocity, providing a quantum analog to the flap** of flags that is common in da…
▽ More
We study the interface dynamics in immiscible binary superfluids using its holographic description, which naturally consists of an inviscid superfluid component and a viscous normal fluid component. We give the first theoretical realization of interface instability for two superfluid components moving with identical velocity, providing a quantum analog to the flap** of flags that is common in daily life. This behavior is in sharp contrast to the one from Gross-Pitaevskii equation for which no such co-flow instability develops in an isolated uniform system because of Galilean invariance. The real time evolution triggered by the dynamical instability exhibits intricate nonlinear patterns leading to quantum turbulence reminiscent of the quantum Kelvin-Helmholtz instability. Moreover, we show that such interface dynamics is essentially different from the Landau instability for which the frictionless flow becomes thermodynamically unstable above a critical superfluid velocity. Our study uncovers the rich interface dynamics of quantum fluids and the emergence of complex flow phenomena.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
New QEC codes and EAQEC codes from repeated-root cyclic codes of length $2^rp^s$
Authors:
Lanqiang Li,
Ziwen Cao,
Tingting Wu,
Li Liu
Abstract:
Let $p$ be an odd prime and $r,s,m$ be positive integers. In this study, we initiate our exploration by delving into the intricate structure of all repeated-root cyclic codes and their duals with a length of $2^rp^s$ over the finite field $\mathbb{F}_{p^m}$. Through the utilization of CSS and Steane's constructions, a series of new quantum error-correcting (QEC) codes are constructed with paramete…
▽ More
Let $p$ be an odd prime and $r,s,m$ be positive integers. In this study, we initiate our exploration by delving into the intricate structure of all repeated-root cyclic codes and their duals with a length of $2^rp^s$ over the finite field $\mathbb{F}_{p^m}$. Through the utilization of CSS and Steane's constructions, a series of new quantum error-correcting (QEC) codes are constructed with parameters distinct from all previous constructions. Furthermore, we provide all maximum distance separable (MDS) cyclic codes of length $2^rp^s$, which are further utilized in the construction of QEC MDS codes. Finally, we introduce a significant number of novel entanglement-assisted quantum error-correcting (EAQEC) codes derived from these repeated-root cyclic codes. Notably, these newly constructed codes exhibit parameters distinct from those of previously known constructions.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Persuasiveness of Generated Free-Text Rationales in Subjective Decisions: A Case Study on Pairwise Argument Ranking
Authors:
Mohamed Elaraby,
Diane Litman,
Xiang Lorraine Li,
Ahmed Magooda
Abstract:
Generating free-text rationales is among the emergent capabilities of Large Language Models (LLMs). These rationales have been found to enhance LLM performance across various NLP tasks. Recently, there has been growing interest in using these rationales to provide insights for various important downstream tasks. In this paper, we analyze generated free-text rationales in tasks with subjective answ…
▽ More
Generating free-text rationales is among the emergent capabilities of Large Language Models (LLMs). These rationales have been found to enhance LLM performance across various NLP tasks. Recently, there has been growing interest in using these rationales to provide insights for various important downstream tasks. In this paper, we analyze generated free-text rationales in tasks with subjective answers, emphasizing the importance of rationalization in such scenarios. We focus on pairwise argument ranking, a highly subjective task with significant potential for real-world applications, such as debate assistance. We evaluate the persuasiveness of rationales generated by nine LLMs to support their subjective choices. Our findings suggest that open-source LLMs, particularly Llama2-70B-chat, are capable of providing highly persuasive rationalizations, surpassing even GPT models. Additionally, our experiments show that rationale persuasiveness can be improved by controlling its parameters through prompting or through self-refinement.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Global Human-guided Counterfactual Explanations for Molecular Properties via Reinforcement Learning
Authors:
Danqing Wang,
Antonis Antoniades,
Kha-Dinh Luong,
Edwin Zhang,
Mert Kosan,
Jiachen Li,
Ambuj Singh,
William Yang Wang,
Lei Li
Abstract:
Counterfactual explanations of Graph Neural Networks (GNNs) offer a powerful way to understand data that can naturally be represented by a graph structure. Furthermore, in many domains, it is highly desirable to derive data-driven global explanations or rules that can better explain the high-level properties of the models and data in question. However, evaluating global counterfactual explanations…
▽ More
Counterfactual explanations of Graph Neural Networks (GNNs) offer a powerful way to understand data that can naturally be represented by a graph structure. Furthermore, in many domains, it is highly desirable to derive data-driven global explanations or rules that can better explain the high-level properties of the models and data in question. However, evaluating global counterfactual explanations is hard in real-world datasets due to a lack of human-annotated ground truth, which limits their use in areas like molecular sciences. Additionally, the increasing scale of these datasets provides a challenge for random search-based methods. In this paper, we develop a novel global explanation model RLHEX for molecular property prediction. It aligns the counterfactual explanations with human-defined principles, making the explanations more interpretable and easy for experts to evaluate. RLHEX includes a VAE-based graph generator to generate global explanations and an adapter to adjust the latent representation space to human-defined principles. Optimized by Proximal Policy Optimization (PPO), the global explanations produced by RLHEX cover 4.12% more input graphs and reduce the distance between the counterfactual explanation set and the input set by 0.47% on average across three molecular datasets. RLHEX provides a flexible framework to incorporate different human-designed principles into the counterfactual explanation generation process, aligning these explanations with domain expertise. The code and data are released at https://github.com/dqwang122/RLHEX.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
First detection of coherent elastic neutrino-nucleus scattering on germanium
Authors:
S. Adamski,
M. Ahn,
P. S. Barbeau,
V. Belov,
I. Bernardi,
C. Bock,
A. Bolozdynya,
R. Bouabid,
J. Browning,
B. Cabrera-Palmer,
N. Cedarblade-Jones,
J. Colón Rivera,
E. Conley,
V. da Silva,
J. Daughhetee,
J. Detwiler,
K. Ding,
M. R. Durand,
Y. Efremenko,
S. R. Elliott,
A. Erlandson,
L. Fabris,
A. Galindo-Uribarri,
M. P. Green,
J. Hakenmüller
, et al. (62 additional authors not shown)
Abstract:
We report the first detection of coherent elastic neutrino-nucleus scattering (CEvNS) on germanium, measured at the Spallation Neutron Source at Oak Ridge National Laboratory. The Ge-Mini detector of the COHERENT collaboration employs large-mass, low-noise, high-purity germanium spectrometers, enabling excellent energy resolution, and an analysis threshold of 1.5 keV electron-equivalent ionization…
▽ More
We report the first detection of coherent elastic neutrino-nucleus scattering (CEvNS) on germanium, measured at the Spallation Neutron Source at Oak Ridge National Laboratory. The Ge-Mini detector of the COHERENT collaboration employs large-mass, low-noise, high-purity germanium spectrometers, enabling excellent energy resolution, and an analysis threshold of 1.5 keV electron-equivalent ionization energy. We observe a on-beam excess of 20.6$_{+7.1}^{-6.3}$ counts with a total exposure of 10.22 GWhkg and we reject the no-CEvNS hypothesis with 3.9 sigma significance. The result agrees with the predicted standard model of particle physics signal rate within 2 sigma.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Stability and Generalizability in SDE Diffusion Models with Measure-Preserving Dynamics
Authors:
Weitong Zhang,
Chengqi Zang,
Liu Li,
Sarah Cechnicka,
Cheng Ouyang,
Bernhard Kainz
Abstract:
Inverse problems describe the process of estimating the causal factors from a set of measurements or data. Map** of often incomplete or degraded data to parameters is ill-posed, thus data-driven iterative solutions are required, for example when reconstructing clean images from poor signals. Diffusion models have shown promise as potent generative tools for solving inverse problems due to their…
▽ More
Inverse problems describe the process of estimating the causal factors from a set of measurements or data. Map** of often incomplete or degraded data to parameters is ill-posed, thus data-driven iterative solutions are required, for example when reconstructing clean images from poor signals. Diffusion models have shown promise as potent generative tools for solving inverse problems due to their superior reconstruction quality and their compatibility with iterative solvers. However, most existing approaches are limited to linear inverse problems represented as Stochastic Differential Equations (SDEs). This simplification falls short of addressing the challenging nature of real-world problems, leading to amplified cumulative errors and biases. We provide an explanation for this gap through the lens of measure-preserving dynamics of Random Dynamical Systems (RDS) with which we analyse Temporal Distribution Discrepancy and thus introduce a theoretical framework based on RDS for SDE diffusion models. We uncover several strategies that inherently enhance the stability and generalizability of diffusion models for inverse problems and introduce a novel score-based diffusion framework, the \textbf{D}ynamics-aware S\textbf{D}E \textbf{D}iffusion \textbf{G}enerative \textbf{M}odel (D$^3$GM). The \textit{Measure-preserving property} can return the degraded measurement to the original state despite complex degradation with the RDS concept of \textit{stability}. Our extensive experimental results corroborate the effectiveness of D$^3$GM across multiple benchmarks including a prominent application for inverse problems, magnetic resonance imaging. Code and data will be publicly available.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Enhancing Travel Choice Modeling with Large Language Models: A Prompt-Learning Approach
Authors:
Xuehao Zhai,
Hanlin Tian,
Lintong Li,
Tianyu Zhao
Abstract:
Travel choice analysis is crucial for understanding individual travel behavior to develop appropriate transport policies and recommendation systems in Intelligent Transportation Systems (ITS). Despite extensive research, this domain faces two critical challenges: a) modeling with limited survey data, and b) simultaneously achieving high model explainability and accuracy. In this paper, we introduc…
▽ More
Travel choice analysis is crucial for understanding individual travel behavior to develop appropriate transport policies and recommendation systems in Intelligent Transportation Systems (ITS). Despite extensive research, this domain faces two critical challenges: a) modeling with limited survey data, and b) simultaneously achieving high model explainability and accuracy. In this paper, we introduce a novel prompt-learning-based Large Language Model(LLM) framework that significantly improves prediction accuracy and provides explicit explanations for individual predictions. This framework involves three main steps: transforming input variables into textual form; building of demonstrations similar to the object, and applying these to a well-trained LLM. We tested the framework's efficacy using two widely used choice datasets: London Passenger Mode Choice (LPMC) and Optima-Mode collected in Switzerland. The results indicate that the LLM significantly outperforms state-of-the-art deep learning methods and discrete choice models in predicting people's choices. Additionally, we present a case of explanation illustrating how the LLM framework generates understandable and explicit explanations at the individual level.
△ Less
Submitted 22 June, 2024; v1 submitted 19 June, 2024;
originally announced June 2024.
-
Demonstration of High-Efficiency Microwave Heating Producing Record Highly Charged Xenon Ion Beams with Superconducting ECR Ion Sources
Authors:
X. Wang,
J. B. Li,
V. Mironov,
J. W. Guo,
X. Z. Zhang,
O. Tarvainen,
Y. C. Feng,
L. X. Li,
J. D. Ma,
Z. H. Zhang,
W. Lu,
S. Bogomolov,
L. Sun,
H. W. Zhao
Abstract:
Intense highly charged ion beam production is essential for high-power heavy ion accelerators. A novel movable Vlasov launcher for superconducting high charge state Electron Cyclotron Resonance (ECR) ion source has been devised that can affect the microwave power effectiveness by a factor of about 4 in terms of highly charged ion beam production. This approach based on a dedicated microwave launch…
▽ More
Intense highly charged ion beam production is essential for high-power heavy ion accelerators. A novel movable Vlasov launcher for superconducting high charge state Electron Cyclotron Resonance (ECR) ion source has been devised that can affect the microwave power effectiveness by a factor of about 4 in terms of highly charged ion beam production. This approach based on a dedicated microwave launching system instead of the traditional coupling scheme has led to new insight on microwave-plasma interaction. With this new understanding, the world record highly charged xenon ion beam currents have been enhanced by up to a factor of 2, which could directly and significantly enhance the performance of heavy ion accelerators and provide many new research opportunities in nuclear physics, atomic physics and other disciplines.
△ Less
Submitted 25 June, 2024; v1 submitted 19 June, 2024;
originally announced June 2024.
-
Thread: A Logic-Based Data Organization Paradigm for How-To Question Answering with Retrieval Augmented Generation
Authors:
Kaikai An,
Fangkai Yang,
Liqun Li,
Junting Lu,
Sitao Cheng,
Lu Wang,
Pu Zhao,
Lele Cao,
Qingwei Lin,
Saravan Rajmohan,
Dongmei Zhang,
Qi Zhang
Abstract:
Current question answering systems leveraging retrieval augmented generation perform well in answering factoid questions but face challenges with non-factoid questions, particularly how-to queries requiring detailed step-by-step instructions and explanations. In this paper, we introduce Thread, a novel data organization paradigm that transforms documents into logic units based on their inter-conne…
▽ More
Current question answering systems leveraging retrieval augmented generation perform well in answering factoid questions but face challenges with non-factoid questions, particularly how-to queries requiring detailed step-by-step instructions and explanations. In this paper, we introduce Thread, a novel data organization paradigm that transforms documents into logic units based on their inter-connectivity. Extensive experiments across open-domain and industrial scenarios demonstrate that Thread outperforms existing data organization paradigms in RAG-based QA systems, significantly improving the handling of how-to questions.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
EPR Steering Criterion and Monogamy Relation via Correlation Matrices in Tripartite Systems
Authors:
Li-Juan Li,
Xiao-Gang Fan,
Xue-Ke Song,
Liu Ye,
Dong Wang
Abstract:
Quantum steering is considered as one of the most well-known nonlocal phenomena in quantum mechanics. Unlike entanglement and Bell non-locality, the asymmetry of quantum steering makes it vital for one-sided device-independent quantum information processing. Although there has been much progress on steering detection for bipartite systems, the criterion for EPR steering in tripartite systems remai…
▽ More
Quantum steering is considered as one of the most well-known nonlocal phenomena in quantum mechanics. Unlike entanglement and Bell non-locality, the asymmetry of quantum steering makes it vital for one-sided device-independent quantum information processing. Although there has been much progress on steering detection for bipartite systems, the criterion for EPR steering in tripartite systems remains challenging and inadequate. In this paper, we firstly derive a novel and promising steering criterion for any three-qubit states via correlation matrix. Furthermore, we propose the monogamy relation between the tripartite steering of system and the bipartite steering of subsystems based on the derived criterion. Finally, as illustrations, we demonstrate the performance of the steering criterion and the monogamy relation by means of several representative examples. We believe that the results and methods presented in this work could be beneficial to capture genuine multipartite steering in the near future.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Controllable and Gradual Facial Blemishes Retouching via Physics-Based Modelling
Authors:
Chenhao Shuai,
Rizhao Cai,
Bandara Dissanayake,
Amanda Newman,
Dayan Guan,
Dennis Sng,
Ling Li,
Alex Kot
Abstract:
Face retouching aims to remove facial blemishes, such as pigmentation and acne, and still retain fine-grain texture details. Nevertheless, existing methods just remove the blemishes but focus little on realism of the intermediate process, limiting their use more to beautifying facial images on social media rather than being effective tools for simulating changes in facial pigmentation and ance. Mo…
▽ More
Face retouching aims to remove facial blemishes, such as pigmentation and acne, and still retain fine-grain texture details. Nevertheless, existing methods just remove the blemishes but focus little on realism of the intermediate process, limiting their use more to beautifying facial images on social media rather than being effective tools for simulating changes in facial pigmentation and ance. Motivated by this limitation, we propose our Controllable and Gradual Face Retouching (CGFR). Our CGFR is based on physical modelling, adopting Sum-of-Gaussians to approximate skin subsurface scattering in a decomposed melanin and haemoglobin color space. Our CGFR offers a user-friendly control over the facial blemishes, achieving realistic and gradual blemishes retouching. Experimental results based on actual clinical data shows that CGFR can realistically simulate the blemishes' gradual recovering process.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
T-JEPA: A Joint-Embedding Predictive Architecture for Trajectory Similarity Computation
Authors:
Lihuan Li,
Hao Xue,
Yang Song,
Flora Salim
Abstract:
Trajectory similarity computation is an essential technique for analyzing moving patterns of spatial data across various applications such as traffic management, wildlife tracking, and location-based services. Modern methods often apply deep learning techniques to approximate heuristic metrics but struggle to learn more robust and generalized representations from the vast amounts of unlabeled traj…
▽ More
Trajectory similarity computation is an essential technique for analyzing moving patterns of spatial data across various applications such as traffic management, wildlife tracking, and location-based services. Modern methods often apply deep learning techniques to approximate heuristic metrics but struggle to learn more robust and generalized representations from the vast amounts of unlabeled trajectory data. Recent approaches focus on self-supervised learning methods such as contrastive learning, which have made significant advancements in trajectory representation learning. However, contrastive learning-based methods heavily depend on manually pre-defined data augmentation schemes, limiting the diversity of generated trajectories and resulting in learning from such variations in 2D Euclidean space, which prevents capturing high-level semantic variations. To address these limitations, we propose T-JEPA, a self-supervised trajectory similarity computation method employing Joint-Embedding Predictive Architecture (JEPA) to enhance trajectory representation learning. T-JEPA samples and predicts trajectory information in representation space, enabling the model to infer the missing components of trajectories at high-level semantics without relying on domain knowledge or manual effort. Extensive experiments conducted on three urban trajectory datasets and two Foursquare datasets demonstrate the effectiveness of T-JEPA in trajectory similarity computation.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.