-
Unveiling the Importance of Longer Paths in Quantum Networks
Authors:
Xinqi Hu,
Gaogao Dong,
Renaud Lambiotte,
Kim Christensen,
**gfang Fan,
Lixin Tian,
Shlomo Havlin,
Xiangyi Meng
Abstract:
The advancement of quantum communication technologies is calling for a better understanding of quantum network (QN) design from first principles, approached through network science. Pioneering studies have established a classical percolation map** to model the task of entanglement transmission across QN. Yet, this map** does not capture the stronger, yet not fully understood connectivity obser…
▽ More
The advancement of quantum communication technologies is calling for a better understanding of quantum network (QN) design from first principles, approached through network science. Pioneering studies have established a classical percolation map** to model the task of entanglement transmission across QN. Yet, this map** does not capture the stronger, yet not fully understood connectivity observed in QNs, which facilitates more efficient entanglement transmission than predicted by classical percolation. In this work, we explore the critical phenomena of the potential statistical theory underlying this enhanced connectivity, known as concurrence percolation. Compared to classical percolation, the concurrence percolation map** employs a unique approach of "superposing" path connectivities, utilizing a different set of path connectivity rules, thereby boosting the overall network connectivity. Firstly, we analytically derive the percolation critical exponents for hierarchical, scale-free networks, particularly the UV flower model, characterized by two distinct network length scales, U$\leq$V. Our analysis confirms that classical and concurrence percolations, albeit both satisfying the hyperscaling relation, fall into separate universality classes. Most importantly, this separation stems from their different treatment of non-shortest path contributions to overall connectivity. Notably, as the longer path scale V increases, concurrence percolation retains unignorable dependence of both its critical threshold and critical exponents on V and thus, comparing with its classical counterpart, shows a higher resilience to the weakening of non-shortest paths. This higher resilience is also observed in real-world network topology, e.g., the Internet. Our findings reveal a first principle for QN design: longer paths still contribute significantly to QN connectivity -- as long as they are abundant.
△ Less
Submitted 23 February, 2024;
originally announced February 2024.
-
Modification of $χ_{c1}$(3872) and $ψ$(2$S$) production in $p$Pb collisions at $\sqrt{s_{NN}} = 8.16$ TeV
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
A. Alfonso Albero,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1082 additional authors not shown)
Abstract:
The LHCb collaboration measures production of the exotic hadron $χ_{c1}$(3872) in proton-nucleus collisions for the first time. Comparison with the charmonium state $ψ$(2$S$) suggests that the exotic $χ_{c1}$(3872) experiences different dynamics in the nuclear medium than conventional hadrons, and comparison with data from proton-proton collisions indicates that the presence of the nucleus may mod…
▽ More
The LHCb collaboration measures production of the exotic hadron $χ_{c1}$(3872) in proton-nucleus collisions for the first time. Comparison with the charmonium state $ψ$(2$S$) suggests that the exotic $χ_{c1}$(3872) experiences different dynamics in the nuclear medium than conventional hadrons, and comparison with data from proton-proton collisions indicates that the presence of the nucleus may modify $χ_{c1}$(3872) production rates. This is the first measurement of the nuclear modification factor of an exotic hadron.
△ Less
Submitted 19 June, 2024; v1 submitted 22 February, 2024;
originally announced February 2024.
-
Does the Generator Mind its Contexts? An Analysis of Generative Model Faithfulness under Context Transfer
Authors:
Xinshuo Hu,
Baotian Hu,
Dongfang Li,
Xiaoguang Li,
Lifeng Shang
Abstract:
The present study introduces the knowledge-augmented generator, which is specifically designed to produce information that remains grounded in contextual knowledge, regardless of alterations in the context. Previous research has predominantly focused on examining hallucinations stemming from static input, such as in the domains of summarization or machine translation. However, our investigation de…
▽ More
The present study introduces the knowledge-augmented generator, which is specifically designed to produce information that remains grounded in contextual knowledge, regardless of alterations in the context. Previous research has predominantly focused on examining hallucinations stemming from static input, such as in the domains of summarization or machine translation. However, our investigation delves into the faithfulness of generative question answering in the presence of dynamic knowledge. Our objective is to explore the existence of hallucinations arising from parametric memory when contextual knowledge undergoes changes, while also analyzing the underlying causes for their occurrence. In order to efficiently address this issue, we propose a straightforward yet effective measure for detecting such hallucinations. Intriguingly, our investigation uncovers that all models exhibit a tendency to generate previous answers as hallucinations. To gain deeper insights into the underlying causes of this phenomenon, we conduct a series of experiments that verify the critical role played by context in hallucination, both during training and testing, from various perspectives.
△ Less
Submitted 22 February, 2024;
originally announced February 2024.
-
Structure-Based Drug Design via 3D Molecular Generative Pre-training and Sampling
Authors:
Yuwei Yang,
Siqi Ouyang,
Xueyu Hu,
Mingyue Zheng,
Hao Zhou,
Lei Li
Abstract:
Structure-based drug design aims at generating high affinity ligands with prior knowledge of 3D target structures. Existing methods either use conditional generative model to learn the distribution of 3D ligands given target binding sites, or iteratively modify molecules to optimize a structure-based activity estimator. The former is highly constrained by data quantity and quality, which leaves op…
▽ More
Structure-based drug design aims at generating high affinity ligands with prior knowledge of 3D target structures. Existing methods either use conditional generative model to learn the distribution of 3D ligands given target binding sites, or iteratively modify molecules to optimize a structure-based activity estimator. The former is highly constrained by data quantity and quality, which leaves optimization-based approaches more promising in practical scenario. However, existing optimization-based approaches choose to edit molecules in 2D space, and use molecular docking to estimate the activity using docking predicted 3D target-ligand complexes. The misalignment between the action space and the objective hinders the performance of these models, especially for those employ deep learning for acceleration. In this work, we propose MolEdit3D to combine 3D molecular generation with optimization frameworks. We develop a novel 3D graph editing model to generate molecules using fragments, and pre-train this model on abundant 3D ligands for learning target-independent properties. Then we employ a target-guided self-learning strategy to improve target-related properties using self-sampled molecules. MolEdit3D achieves state-of-the-art performance on majority of the evaluation metrics, and demonstrate strong capability of capturing both target-dependent and -independent properties.
△ Less
Submitted 15 March, 2024; v1 submitted 22 February, 2024;
originally announced February 2024.
-
Room-temperature sub-100 nm Néel-type skyrmions in non-stoichiometric van der Waals ferromagnet $\rm Fe_{3-x}GaTe_{2}$ with ultrafast laser writability
Authors:
Zefang Li,
Huai Zhang,
Guanqi Li,
Jiangteng Guo,
Qing** Wang,
Ying Deng,
Yue Hu,
Xuange Hu,
Can Liu,
Minghui Qin,
Xi Shen,
Richeng Yu,
Xingsen Gao,
Zhimin Liao,
Junming Liu,
Zhipeng Hou,
Yimei Zhu,
Xuewen Fu
Abstract:
Realizing room-temperature magnetic skyrmions in two-dimensional van der Waals ferromagnets offers unparalleled prospects for future spintronic applications. However, due to the intrinsic spin fluctuations that suppress atomic long-range magnetic order and the inherent inversion crystal symmetry that excludes the presence of the Dzyaloshinskii-Moriya interaction, achieving room-temperature skyrmio…
▽ More
Realizing room-temperature magnetic skyrmions in two-dimensional van der Waals ferromagnets offers unparalleled prospects for future spintronic applications. However, due to the intrinsic spin fluctuations that suppress atomic long-range magnetic order and the inherent inversion crystal symmetry that excludes the presence of the Dzyaloshinskii-Moriya interaction, achieving room-temperature skyrmions in 2D magnets remains a formidable challenge. In this study, we target room-temperature 2D magnet $\rm Fe_3GaTe_2$ and unveil that the introduction of iron-deficient into this compound enables spatial inversion symmetry breaking, thus inducing a significant Dzyaloshinskii-Moriya interaction that brings about room-temperature Néel-type skyrmions with unprecedentedly small size. To further enhance the practical applications of this finding, we employ a homemade in-situ optical Lorentz transmission electron microscopy to demonstrate ultrafast writing of skyrmions in $\rm Fe_{3-x}GaTe_2$ using a single femtosecond laser pulse. Our results manifest the $\rm Fe_{3-x}GaTe_2$ as a promising building block for realizing skyrmion-based magneto-optical functionalities.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
An architecture for two-qubit encoding in neutral ytterbium-171 atoms
Authors:
Zhubing Jia,
William Huie,
Lintao Li,
Won Kyu Calvin Sun,
Xiye Hu,
Aakash,
Healey Kogan,
Abhishek Karve,
Jong Yeon Lee,
Jacob P. Covey
Abstract:
We present an architecture for encoding two qubits within the optical "clock" transition and nuclear spin-1/2 degree of freedom of neutral ytterbium-171 atoms. Inspired by recent high-fidelity control of all pairs of states within this four-dimensional ququart space, we present a toolbox for intra-ququart (single atom) one- and two-qubit gates, inter-ququart (two atom) Rydberg-based two- and four-…
▽ More
We present an architecture for encoding two qubits within the optical "clock" transition and nuclear spin-1/2 degree of freedom of neutral ytterbium-171 atoms. Inspired by recent high-fidelity control of all pairs of states within this four-dimensional ququart space, we present a toolbox for intra-ququart (single atom) one- and two-qubit gates, inter-ququart (two atom) Rydberg-based two- and four-qubit gates, and quantum nondemolition (QND) readout. We then use this toolbox to demonstrate the advantages of the ququart encoding for entanglement distillation and quantum error correction which exhibit superior hardware efficiency and better performance in some cases since fewer two-atom (Rydberg-based) operations are required. Finally, leveraging single-state QND readout in our ququart encoding, we present a unique approach to studying interactive circuits as well as to realizing a symmetry protected topological phase of a spin-1 chain with a shallow, constant-depth circuit. These applications are all within reach of recent experiments with neutral ytterbium-171 atom arrays or with several trapped ion species.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
Revitalizing Multivariate Time Series Forecasting: Learnable Decomposition with Inter-Series Dependencies and Intra-Series Variations Modeling
Authors:
Guoqi Yu,
**g Zou,
Xiaowei Hu,
Angelica I. Aviles-Rivero,
**g Qin,
Shujun Wang
Abstract:
Predicting multivariate time series is crucial, demanding precise modeling of intricate patterns, including inter-series dependencies and intra-series variations. Distinctive trend characteristics in each time series pose challenges, and existing methods, relying on basic moving average kernels, may struggle with the non-linear structure and complex trends in real-world data. Given that, we introd…
▽ More
Predicting multivariate time series is crucial, demanding precise modeling of intricate patterns, including inter-series dependencies and intra-series variations. Distinctive trend characteristics in each time series pose challenges, and existing methods, relying on basic moving average kernels, may struggle with the non-linear structure and complex trends in real-world data. Given that, we introduce a learnable decomposition strategy to capture dynamic trend information more reasonably. Additionally, we propose a dual attention module tailored to capture inter-series dependencies and intra-series variations simultaneously for better time series forecasting, which is implemented by channel-wise self-attention and autoregressive self-attention. To evaluate the effectiveness of our method, we conducted experiments across eight open-source datasets and compared it with the state-of-the-art methods. Through the comparison results, our Leddam (LEarnable Decomposition and Dual Attention Module) not only demonstrates significant advancements in predictive performance, but also the proposed decomposition strategy can be plugged into other methods with a large performance-boosting, from 11.87% to 48.56% MSE error degradation.
△ Less
Submitted 1 July, 2024; v1 submitted 19 February, 2024;
originally announced February 2024.
-
Are LLM-based Evaluators Confusing NLG Quality Criteria?
Authors:
Xinyu Hu,
Mingqi Gao,
Sen Hu,
Yang Zhang,
Yicheng Chen,
Teng Xu,
Xiaojun Wan
Abstract:
Some prior work has shown that LLMs perform well in NLG evaluation for different tasks. However, we discover that LLMs seem to confuse different evaluation criteria, which reduces their reliability. For further verification, we first consider avoiding issues of inconsistent conceptualization and vague expression in existing NLG quality criteria themselves. So we summarize a clear hierarchical clas…
▽ More
Some prior work has shown that LLMs perform well in NLG evaluation for different tasks. However, we discover that LLMs seem to confuse different evaluation criteria, which reduces their reliability. For further verification, we first consider avoiding issues of inconsistent conceptualization and vague expression in existing NLG quality criteria themselves. So we summarize a clear hierarchical classification system for 11 common aspects with corresponding different criteria from previous studies involved. Inspired by behavioral testing, we elaborately design 18 types of aspect-targeted perturbation attacks for fine-grained analysis of the evaluation behaviors of different LLMs. We also conduct human annotations beyond the guidance of the classification system to validate the impact of the perturbations. Our experimental results reveal confusion issues inherent in LLMs, as well as other noteworthy phenomena, and necessitate further research and improvements for LLM-based evaluation.
△ Less
Submitted 28 June, 2024; v1 submitted 19 February, 2024;
originally announced February 2024.
-
Multiwavelength Polarization Observations of Mrk 501
Authors:
Xin-Ke Hu,
Yu-Wei Yu,
** Zhang,
Xiang-Gao Wang,
Kishore C. Patra,
Thomas G. Brink,
Wei-Kang Zheng,
Qi Wang,
De-Feng Kong,
Liang-Jun Chen,
Ji-Wang Zhou,
Jia-Xin Cao,
Ming-Xuan Lu,
Zi-Min Zhou,
Yi-Ning Wei,
Xin-Bo Huang,
Xing-Lin Li,
Hao Lou,
Ji-Rong Mao,
En-Wei Liang,
Alexei V. Filippenko
Abstract:
Mrk 501 is a prototypical high-synchrotron-peaked blazar (HBL) and serves as one of the primary targets for the {\it Imaging X-ray Polarimetry Explorer} ({\it IXPE}). In this study, we report X-ray polarization measurements of Mrk 501 based on six {\it IXPE} observations. The detection of X-ray polarization at a confidence level exceeding 99\% is achieved in four out of the six observations conduc…
▽ More
Mrk 501 is a prototypical high-synchrotron-peaked blazar (HBL) and serves as one of the primary targets for the {\it Imaging X-ray Polarimetry Explorer} ({\it IXPE}). In this study, we report X-ray polarization measurements of Mrk 501 based on six {\it IXPE} observations. The detection of X-ray polarization at a confidence level exceeding 99\% is achieved in four out of the six observations conducted across the entire energy range (2--8 keV) of {\it IXPE}. The maximum polarization degree ($Π_{\rm X}$) is measured to be $15.8\%\pm2.8\%$, accompanied by a polarization angle ($ψ_{\rm X}$) of $98.0°\pm5.1°$ at a confidence level of $5.6 σ$. During the remaining two observations, only an upper limit of $Π_{\rm X}<$12\% could be derived at the 99\% confidence level. No temporal variability in polarization is observed throughout all six {\it IXPE} observations for Mrk 501. A discernible trend of energy-dependent variation in the polarization degree is detected in optical spectropolarimetry; however, no analogous indication is observed in $Π_{\rm X}$. The chromatic behavior of $Π$ and the consistent values of $ψ$ across different frequencies from X-rays to radio waves, along with the agreement between $ψ$ and jet position angle, strongly support the interpretation of the energy-stratified model with shock-accelerated particles in the jet of Mrk 501. Additionally, the possibility of the presence of a global helical magnetic field in the jet of Mrk 501 is discussed.
△ Less
Submitted 3 July, 2024; v1 submitted 19 February, 2024;
originally announced February 2024.
-
Rethinking the Roles of Large Language Models in Chinese Grammatical Error Correction
Authors:
Yinghui Li,
Shang Qin,
**gheng Ye,
Shirong Ma,
Yangning Li,
Libo Qin,
Xuming Hu,
Wenhao Jiang,
Hai-Tao Zheng,
Philip S. Yu
Abstract:
Recently, Large Language Models (LLMs) have been widely studied by researchers for their roles in various downstream NLP tasks. As a fundamental task in the NLP field, Chinese Grammatical Error Correction (CGEC) aims to correct all potential grammatical errors in the input sentences. Previous studies have shown that LLMs' performance as correctors on CGEC remains unsatisfactory due to its challeng…
▽ More
Recently, Large Language Models (LLMs) have been widely studied by researchers for their roles in various downstream NLP tasks. As a fundamental task in the NLP field, Chinese Grammatical Error Correction (CGEC) aims to correct all potential grammatical errors in the input sentences. Previous studies have shown that LLMs' performance as correctors on CGEC remains unsatisfactory due to its challenging task focus. To promote the CGEC field to better adapt to the era of LLMs, we rethink the roles of LLMs in the CGEC task so that they can be better utilized and explored in CGEC. Considering the rich grammatical knowledge stored in LLMs and their powerful semantic understanding capabilities, we utilize LLMs as explainers to provide explanation information for the CGEC small models during error correction to enhance performance. We also use LLMs as evaluators to bring more reasonable CGEC evaluations, thus alleviating the troubles caused by the subjectivity of the CGEC task. In particular, our work is also an active exploration of how LLMs and small models better collaborate in downstream tasks. Extensive experiments and detailed analyses on widely used datasets verify the effectiveness of our thinking intuition and the proposed methods.
△ Less
Submitted 17 February, 2024;
originally announced February 2024.
-
When LLMs Meet Cunning Texts: A Fallacy Understanding Benchmark for Large Language Models
Authors:
Yinghui Li,
Qingyu Zhou,
Yuanzhen Luo,
Shirong Ma,
Yangning Li,
Hai-Tao Zheng,
Xuming Hu,
Philip S. Yu
Abstract:
Recently, Large Language Models (LLMs) make remarkable evolutions in language understanding and generation. Following this, various benchmarks for measuring all kinds of capabilities of LLMs have sprung up. In this paper, we challenge the reasoning and understanding abilities of LLMs by proposing a FaLlacy Understanding Benchmark (FLUB) containing cunning texts that are easy for humans to understa…
▽ More
Recently, Large Language Models (LLMs) make remarkable evolutions in language understanding and generation. Following this, various benchmarks for measuring all kinds of capabilities of LLMs have sprung up. In this paper, we challenge the reasoning and understanding abilities of LLMs by proposing a FaLlacy Understanding Benchmark (FLUB) containing cunning texts that are easy for humans to understand but difficult for models to grasp. Specifically, the cunning texts that FLUB focuses on mainly consist of the tricky, humorous, and misleading texts collected from the real internet environment. And we design three tasks with increasing difficulty in the FLUB benchmark to evaluate the fallacy understanding ability of LLMs. Based on FLUB, we investigate the performance of multiple representative and advanced LLMs, reflecting our FLUB is challenging and worthy of more future study. Interesting discoveries and valuable insights are achieved in our extensive experiments and detailed analyses. We hope that our benchmark can encourage the community to improve LLMs' ability to understand fallacies. Our data and codes are available at https://github.com/THUKElab/FLUB.
△ Less
Submitted 9 June, 2024; v1 submitted 16 February, 2024;
originally announced February 2024.
-
Simulation-based Analysis of a Novel Loop-based Road Topology for Autonomous Vehicles
Authors:
Stefan Ramdhan,
Winnie Trandinh,
Sathurshan Arulmohan,
Xiayong Hu,
Spencer Deevy,
Victor Bandur,
Vera Pantelic,
Mark Lawford,
Alan Wassyng
Abstract:
The challenges in implementing SAE Level 4/5 autonomous vehicles are manifold, with intersection navigation being a pervasive one. We analyze a novel road topology invented by a co-author of this paper, Xiayong Hu. The topology eliminates the need for traditional traffic control and cross-traffic at intersections, potentially improving the safety of autonomous driving systems. The topology, herein…
▽ More
The challenges in implementing SAE Level 4/5 autonomous vehicles are manifold, with intersection navigation being a pervasive one. We analyze a novel road topology invented by a co-author of this paper, Xiayong Hu. The topology eliminates the need for traditional traffic control and cross-traffic at intersections, potentially improving the safety of autonomous driving systems. The topology, herein called the Zonal Road Topology, consists of unidirectional loops of road with traffic flowing either clockwise or counter-clockwise. Adjacent loops are directionally aligned with one another, allowing vehicles to transfer from one loop to another through a simple lane change. To evaluate the Zonal Road Topology, a one km2 pilot-track near Changshu, China is currently being set aside for testing. In parallel, traffic simulations are being performed. To this end, we conduct a simulation-based comparison between the Zonal Road Topology and a traditional road topology for a generic Electric Vehicle (EV) using the Simulation for Urban MObility (SUMO) platform and MATLAB/Simulink. We analyze the topologies in terms of their travel efficiency, safety, energy usage, and capacity. Drive time, number of halts, progress rate, and other metrics are analyzed across varied traffic levels to investigate the advantages and disadvantages of the Zonal Road Topology. Our results indicate that vehicles on the Zonal Road Topology have a lower, more consistent drive time with greater traffic throughput, while using less energy on average. These results become more prominent at higher traffic densities.
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
PC-NeRF: Parent-Child Neural Radiance Fields Using Sparse LiDAR Frames in Autonomous Driving Environments
Authors:
Xiuzhong Hu,
Guangming Xiong,
Zheng Zang,
Peng Jia,
Yuxuan Han,
Junyi Ma
Abstract:
Large-scale 3D scene reconstruction and novel view synthesis are vital for autonomous vehicles, especially utilizing temporally sparse LiDAR frames. However, conventional explicit representations remain a significant bottleneck towards representing the reconstructed and synthetic scenes at unlimited resolution. Although the recently developed neural radiance fields (NeRF) have shown compelling res…
▽ More
Large-scale 3D scene reconstruction and novel view synthesis are vital for autonomous vehicles, especially utilizing temporally sparse LiDAR frames. However, conventional explicit representations remain a significant bottleneck towards representing the reconstructed and synthetic scenes at unlimited resolution. Although the recently developed neural radiance fields (NeRF) have shown compelling results in implicit representations, the problem of large-scale 3D scene reconstruction and novel view synthesis using sparse LiDAR frames remains unexplored. To bridge this gap, we propose a 3D scene reconstruction and novel view synthesis framework called parent-child neural radiance field (PC-NeRF). Based on its two modules, parent NeRF and child NeRF, the framework implements hierarchical spatial partitioning and multi-level scene representation, including scene, segment, and point levels. The multi-level scene representation enhances the efficient utilization of sparse LiDAR point cloud data and enables the rapid acquisition of an approximate volumetric scene representation. With extensive experiments, PC-NeRF is proven to achieve high-precision novel LiDAR view synthesis and 3D reconstruction in large-scale scenes. Moreover, PC-NeRF can effectively handle situations with sparse LiDAR frames and demonstrate high deployment efficiency with limited training epochs. Our approach implementation and the pre-trained models are available at https://github.com/biter0088/pc-nerf.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
ScreenAgent: A Vision Language Model-driven Computer Control Agent
Authors:
Runliang Niu,
**dong Li,
Shiqi Wang,
Yali Fu,
Xiyu Hu,
Xueyuan Leng,
He Kong,
Yi Chang,
Qi Wang
Abstract:
Existing Large Language Models (LLM) can invoke a variety of tools and APIs to complete complex tasks. The computer, as the most powerful and universal tool, could potentially be controlled directly by a trained LLM agent. Powered by the computer, we can hopefully build a more generalized agent to assist humans in various daily digital works. In this paper, we construct an environment for a Vision…
▽ More
Existing Large Language Models (LLM) can invoke a variety of tools and APIs to complete complex tasks. The computer, as the most powerful and universal tool, could potentially be controlled directly by a trained LLM agent. Powered by the computer, we can hopefully build a more generalized agent to assist humans in various daily digital works. In this paper, we construct an environment for a Vision Language Model (VLM) agent to interact with a real computer screen. Within this environment, the agent can observe screenshots and manipulate the Graphics User Interface (GUI) by outputting mouse and keyboard actions. We also design an automated control pipeline that includes planning, acting, and reflecting phases, guiding the agent to continuously interact with the environment and complete multi-step tasks. Additionally, we construct the ScreenAgent Dataset, which collects screenshots and action sequences when completing a variety of daily computer tasks. Finally, we trained a model, ScreenAgent, which achieved computer control capabilities comparable to GPT-4V and demonstrated more precise UI positioning capabilities. Our attempts could inspire further research on building a generalist LLM agent. The code is available at \url{https://github.com/niuzaisheng/ScreenAgent}.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
PAS-SLAM: A Visual SLAM System for Planar Ambiguous Scenes
Authors:
Xinggang Hu,
Yanmin Wu,
Mingyuan Zhao,
Linghao Yang,
Xiangkui Zhang,
Xiangyang Ji
Abstract:
Visual SLAM (Simultaneous Localization and Map**) based on planar features has found widespread applications in fields such as environmental structure perception and augmented reality. However, current research faces challenges in accurately localizing and map** in planar ambiguous scenes, primarily due to the poor accuracy of the employed planar features and data association methods. In this…
▽ More
Visual SLAM (Simultaneous Localization and Map**) based on planar features has found widespread applications in fields such as environmental structure perception and augmented reality. However, current research faces challenges in accurately localizing and map** in planar ambiguous scenes, primarily due to the poor accuracy of the employed planar features and data association methods. In this paper, we propose a visual SLAM system based on planar features designed for planar ambiguous scenes, encompassing planar processing, data association, and multi-constraint factor graph optimization. We introduce a planar processing strategy that integrates semantic information with planar features, extracting the edges and vertices of planes to be utilized in tasks such as plane selection, data association, and pose optimization. Next, we present an integrated data association strategy that combines plane parameters, semantic information, projection IoU (Intersection over Union), and non-parametric tests, achieving accurate and robust plane data association in planar ambiguous scenes. Finally, we design a set of multi-constraint factor graphs for camera pose optimization. Qualitative and quantitative experiments conducted on publicly available datasets demonstrate that our proposed system competes effectively in both accuracy and robustness in terms of map construction and camera localization compared to state-of-the-art methods.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
Measurement of the Branching Fraction of $B^{0} \rightarrow J/ψπ^{0}$ Decays
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
J. A. Adams,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey
, et al. (1067 additional authors not shown)
Abstract:
The ratio of branching fractions between $B^{0} \rightarrow J/ψπ^{0}$ and $B^{+} \rightarrow J/ψK^{*+}$ decays is measured with proton-proton collision data collected by the LHCb experiment, corresponding to an integrated luminosity of 9 fb$^{-1}$. The measured value is…
▽ More
The ratio of branching fractions between $B^{0} \rightarrow J/ψπ^{0}$ and $B^{+} \rightarrow J/ψK^{*+}$ decays is measured with proton-proton collision data collected by the LHCb experiment, corresponding to an integrated luminosity of 9 fb$^{-1}$. The measured value is $\frac{\mathcal{B}_{B^{0} \rightarrow J/ψπ^{0}}}{\mathcal{B}_{B^{+} \rightarrow J/ψK^{*+}}} = (1.153 \pm 0.053 \pm 0.048 ) \times 10^{-2}$, where the first uncertainty is statistical and the second is systematic. The branching fraction for $B^{0} \rightarrow J/ψπ^{0}$ decays is determined using the branching fraction of the normalisation channel, resulting in $\mathcal{B}_{B^{0} \rightarrow J/ψπ^{0}} = (1.670 \pm 0.077 \pm 0.069 \pm 0.095) \times 10^{-5}$, where the last uncertainty corresponds to that of the external input. This result is consistent with the current world average value and competitive with the most precise single measurement to date.
△ Less
Submitted 23 May, 2024; v1 submitted 8 February, 2024;
originally announced February 2024.
-
Observation of the $B_c^+ \to J/ψπ^+ π^0$ decay
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
J. A. Adams,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey
, et al. (1064 additional authors not shown)
Abstract:
The first observation of the $B_c^+ \to J/ψπ^+ π^0$ decay is reported with high significance using proton-proton collision data, corresponding to an integrated luminosity of 9fb$^{-1}$, collected with the LHCb detector at centre-of-mass energies of 7, 8, and 13 TeV. The ratio of its branching fraction relative to the $B_c^+ \to J/ψπ^+$ channel is measured to be…
▽ More
The first observation of the $B_c^+ \to J/ψπ^+ π^0$ decay is reported with high significance using proton-proton collision data, corresponding to an integrated luminosity of 9fb$^{-1}$, collected with the LHCb detector at centre-of-mass energies of 7, 8, and 13 TeV. The ratio of its branching fraction relative to the $B_c^+ \to J/ψπ^+$ channel is measured to be
$$
\frac{ {\cal{B}}( B_c^+ \to J/ψπ^+π^0 ) }
{ {\cal{B}}( B_c^+ \to J/ψπ^+ ) }
= 2.80 \pm 0.15 \pm 0.11 \pm 0.16 \,,
$$ where the first uncertainty is statistical, the second systematic and the third related to imprecise knowledge of the branching fractions for $B^+ \to J/ψK^{*+}$ and $B^+ \to J/ψK^+$ decays, which are used to determine the $π^0$ detection efficiency. The $π^+π^0$ mass spectrum is found to be consistent with the dominance of an intermediate $ρ^+$ contribution in accordance with a model based on QCD factorisation.
△ Less
Submitted 15 May, 2024; v1 submitted 8 February, 2024;
originally announced February 2024.
-
Version age-based client scheduling policy for federated learning
Authors:
Xinyi Hu,
Nikolaos Pappas,
Howard H. Yang
Abstract:
Federated Learning (FL) has emerged as a privacy-preserving machine learning paradigm facilitating collaborative training across multiple clients without sharing local data. Despite advancements in edge device capabilities, communication bottlenecks present challenges in aggregating a large number of clients; only a portion of the clients can update their parameters upon each global aggregation. T…
▽ More
Federated Learning (FL) has emerged as a privacy-preserving machine learning paradigm facilitating collaborative training across multiple clients without sharing local data. Despite advancements in edge device capabilities, communication bottlenecks present challenges in aggregating a large number of clients; only a portion of the clients can update their parameters upon each global aggregation. This phenomenon introduces the critical challenge of stragglers in FL and the profound impact of client scheduling policies on global model convergence and stability. Existing scheduling strategies address staleness but predominantly focus on either timeliness or content. Motivated by this, we introduce the novel concept of Version Age of Information (VAoI) to FL. Unlike traditional Age of Information metrics, VAoI considers both timeliness and content staleness. Each client's version age is updated discretely, indicating the freshness of information. VAoI is incorporated into the client scheduling policy to minimize the average VAoI, mitigating the impact of outdated local updates and enhancing the stability of FL systems.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
SALAD-Bench: A Hierarchical and Comprehensive Safety Benchmark for Large Language Models
Authors:
Lijun Li,
Bowen Dong,
Ruohui Wang,
Xuhao Hu,
Wangmeng Zuo,
Dahua Lin,
Yu Qiao,
**g Shao
Abstract:
In the rapidly evolving landscape of Large Language Models (LLMs), ensuring robust safety measures is paramount. To meet this crucial need, we propose \emph{SALAD-Bench}, a safety benchmark specifically designed for evaluating LLMs, attack, and defense methods. Distinguished by its breadth, SALAD-Bench transcends conventional benchmarks through its large scale, rich diversity, intricate taxonomy s…
▽ More
In the rapidly evolving landscape of Large Language Models (LLMs), ensuring robust safety measures is paramount. To meet this crucial need, we propose \emph{SALAD-Bench}, a safety benchmark specifically designed for evaluating LLMs, attack, and defense methods. Distinguished by its breadth, SALAD-Bench transcends conventional benchmarks through its large scale, rich diversity, intricate taxonomy spanning three levels, and versatile functionalities.SALAD-Bench is crafted with a meticulous array of questions, from standard queries to complex ones enriched with attack, defense modifications and multiple-choice. To effectively manage the inherent complexity, we introduce an innovative evaluators: the LLM-based MD-Judge for QA pairs with a particular focus on attack-enhanced queries, ensuring a seamless, and reliable evaluation. Above components extend SALAD-Bench from standard LLM safety evaluation to both LLM attack and defense methods evaluation, ensuring the joint-purpose utility. Our extensive experiments shed light on the resilience of LLMs against emerging threats and the efficacy of contemporary defense tactics. Data and evaluator are released under https://github.com/OpenSafetyLab/SALAD-BENCH.
△ Less
Submitted 7 June, 2024; v1 submitted 7 February, 2024;
originally announced February 2024.
-
The scaling limit of the volume of loop O(n) quadrangulations
Authors:
Élie Aïdékon,
William Da Silva,
XingJian Hu
Abstract:
We study the volume of rigid loop-$O(n)$ quadrangulations with a boundary of length $2p$ in the critical non-generic regime. We prove that, as the half-perimeter $p$ goes to infinity, the volume scales in distribution to an explicit random variable. This limiting random variable is described in terms of the multiplicative cascades of Chen, Curien and Maillard arXiv:1702.06916, or alternatively (in…
▽ More
We study the volume of rigid loop-$O(n)$ quadrangulations with a boundary of length $2p$ in the critical non-generic regime. We prove that, as the half-perimeter $p$ goes to infinity, the volume scales in distribution to an explicit random variable. This limiting random variable is described in terms of the multiplicative cascades of Chen, Curien and Maillard arXiv:1702.06916, or alternatively (in the dilute case) as the law of the area of a unit-boundary $γ$-quantum disc, as determined by Ang and Gwynne arXiv:1903.09120, for suitable $γ$. Our arguments go through a classification of the map into several regions, where we rule out the contribution of bad regions to be left with a tractable portion of the map. One key observable for this classification is a Markov chain which explores the nested loops around a size-biased vertex pick in the map, making explicit the spinal structure of the discrete multiplicative cascade.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
FaithLM: Towards Faithful Explanations for Large Language Models
Authors:
Yu-Neng Chuang,
Guanchu Wang,
Chia-Yuan Chang,
Ruixiang Tang,
Shaochen Zhong,
Fan Yang,
Mengnan Du,
Xuanting Cai,
Xia Hu
Abstract:
Large Language Models (LLMs) have become proficient in addressing complex tasks by leveraging their extensive internal knowledge and reasoning capabilities. However, the black-box nature of these models complicates the task of explaining their decision-making processes. While recent advancements demonstrate the potential of leveraging LLMs to self-explain their predictions through natural language…
▽ More
Large Language Models (LLMs) have become proficient in addressing complex tasks by leveraging their extensive internal knowledge and reasoning capabilities. However, the black-box nature of these models complicates the task of explaining their decision-making processes. While recent advancements demonstrate the potential of leveraging LLMs to self-explain their predictions through natural language (NL) explanations, their explanations may not accurately reflect the LLMs' decision-making process due to a lack of fidelity optimization on the derived explanations. Measuring the fidelity of NL explanations is a challenging issue, as it is difficult to manipulate the input context to mask the semantics of these explanations. To this end, we introduce FaithLM to explain the decision of LLMs with NL explanations. Specifically, FaithLM designs a method for evaluating the fidelity of NL explanations by incorporating the contrary explanations to the query process. Moreover, FaithLM conducts an iterative process to improve the fidelity of derived explanations. Experiment results on three datasets from multiple domains demonstrate that FaithLM can significantly improve the fidelity of derived explanations, which also provides a better alignment with the ground-truth explanations.
△ Less
Submitted 26 June, 2024; v1 submitted 7 February, 2024;
originally announced February 2024.
-
Fermi Large Area Telescope Detection of Gamma-Rays from NGC 6251 Radio Lobe
Authors:
Yu-Wei Yu,
Hai-Ming Zhang,
Ying-Ying Gan,
Xin-Ke Hu,
Tan-Zheng Wu,
** Zhang
Abstract:
We report on the detection of extended $γ$-ray emission from lobes in the radio galaxy NGC 6251 using observation data of Fermi Large Area Telescope (Fermi-LAT). The maximum likelihood analysis results show that a radio morphology template provides a better fit than a point-like source description for the observational data at a confidence level of 8.1$σ$, and the contribution of lobes accounts fo…
▽ More
We report on the detection of extended $γ$-ray emission from lobes in the radio galaxy NGC 6251 using observation data of Fermi Large Area Telescope (Fermi-LAT). The maximum likelihood analysis results show that a radio morphology template provides a better fit than a point-like source description for the observational data at a confidence level of 8.1$σ$, and the contribution of lobes accounts for more than 50\% of the total $γ$-ray flux. Furthermore, the $γ$-ray energy spectra show a significant disparity in shape between the core and lobe regions, with a curved log-parabola shape observed in core region and a power-law form observed in lobes. Neither the core region nor the northwest lobe displays the significant flux variations in the long-term $γ$-ray light curves. The broadband spectral energy distributions of both core region and northwest lobe can be will explained with a single-zone leptonic model. The $γ$-rays of core region are due to the synchrotron-self-Compton process while the $γ$-rays from northwest lobe are interpreted as inverse Compton emission of the cosmic microwave background.
△ Less
Submitted 22 February, 2024; v1 submitted 7 February, 2024;
originally announced February 2024.
-
AdaFlow: Imitation Learning with Variance-Adaptive Flow-Based Policies
Authors:
Xixi Hu,
Bo Liu,
Xingchao Liu,
Qiang Liu
Abstract:
Diffusion-based imitation learning improves Behavioral Cloning (BC) on multi-modal decision-making, but comes at the cost of significantly slower inference due to the recursion in the diffusion process. It urges us to design efficient policy generators while kee** the ability to generate diverse actions. To address this challenge, we propose AdaFlow, an imitation learning framework based on flow…
▽ More
Diffusion-based imitation learning improves Behavioral Cloning (BC) on multi-modal decision-making, but comes at the cost of significantly slower inference due to the recursion in the diffusion process. It urges us to design efficient policy generators while kee** the ability to generate diverse actions. To address this challenge, we propose AdaFlow, an imitation learning framework based on flow-based generative modeling. AdaFlow represents the policy with state-conditioned ordinary differential equations (ODEs), which are known as probability flows. We reveal an intriguing connection between the conditional variance of their training loss and the discretization error of the ODEs. With this insight, we propose a variance-adaptive ODE solver that can adjust its step size in the inference stage, making AdaFlow an adaptive decision-maker, offering rapid inference without sacrificing diversity. Interestingly, it automatically reduces to a one-step generator when the action distribution is uni-modal. Our comprehensive empirical evaluation shows that AdaFlow achieves high performance across all dimensions, including success rate, behavioral diversity, and inference speed. The code is available at https://github.com/hxixixh/AdaFlow
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
SpecFormer: Guarding Vision Transformer Robustness via Maximum Singular Value Penalization
Authors:
Xixu Hu,
Runkai Zheng,
**dong Wang,
Cheuk Hang Leung,
Qi Wu,
Xing Xie
Abstract:
Vision Transformers (ViTs) have gained prominence as a preferred choice for a wide range of computer vision tasks due to their exceptional performance. However, their widespread adoption has raised concerns about security in the face of malicious attacks. Most existing methods rely on empirical adjustments during the training process, lacking a clear theoretical foundation. In this study, we addre…
▽ More
Vision Transformers (ViTs) have gained prominence as a preferred choice for a wide range of computer vision tasks due to their exceptional performance. However, their widespread adoption has raised concerns about security in the face of malicious attacks. Most existing methods rely on empirical adjustments during the training process, lacking a clear theoretical foundation. In this study, we address this gap by introducing SpecFormer, specifically designed to enhance ViTs' resilience against adversarial attacks, with support from carefully derived theoretical guarantees. We establish local Lipschitz bounds for the self-attention layer and introduce a novel approach, Maximum Singular Value Penalization (MSVP), to attain precise control over these bounds. We seamlessly integrate MSVP into ViTs' attention layers, using the power iteration method for enhanced computational efficiency. The modified model, SpecFormer, effectively reduces the spectral norms of attention weight matrices, thereby enhancing network local Lipschitzness. This, in turn, leads to improved training efficiency and robustness. Extensive experiments on CIFAR and ImageNet datasets confirm SpecFormer's superior performance in defending against adversarial attacks.
△ Less
Submitted 2 January, 2024;
originally announced February 2024.
-
Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning
Authors:
Shengyi Huang,
Quentin Gallouédec,
Florian Felten,
Antonin Raffin,
Rousslan Fernand Julien Dossa,
Yanxiao Zhao,
Ryan Sullivan,
Viktor Makoviychuk,
Denys Makoviichuk,
Mohamad H. Danesh,
Cyril Roumégous,
Jiayi Weng,
Chufan Chen,
Md Masudur Rahman,
João G. M. Araújo,
Guorui Quan,
Daniel Tan,
Timo Klein,
Rujikorn Charakorn,
Mark Towers,
Yann Berthelot,
Kinal Mehta,
Dipam Chakraborty,
Arjun KG,
Valentin Charraut
, et al. (8 additional authors not shown)
Abstract:
In many Reinforcement Learning (RL) papers, learning curves are useful indicators to measure the effectiveness of RL algorithms. However, the complete raw data of the learning curves are rarely available. As a result, it is usually necessary to reproduce the experiments from scratch, which can be time-consuming and error-prone. We present Open RL Benchmark, a set of fully tracked RL experiments, i…
▽ More
In many Reinforcement Learning (RL) papers, learning curves are useful indicators to measure the effectiveness of RL algorithms. However, the complete raw data of the learning curves are rarely available. As a result, it is usually necessary to reproduce the experiments from scratch, which can be time-consuming and error-prone. We present Open RL Benchmark, a set of fully tracked RL experiments, including not only the usual data such as episodic return, but also all algorithm-specific and system metrics. Open RL Benchmark is community-driven: anyone can download, use, and contribute to the data. At the time of writing, more than 25,000 runs have been tracked, for a cumulative duration of more than 8 years. Open RL Benchmark covers a wide range of RL libraries and reference implementations. Special care is taken to ensure that each experiment is precisely reproducible by providing not only the full parameters, but also the versions of the dependencies used to generate it. In addition, Open RL Benchmark comes with a command-line interface (CLI) for easy fetching and generating figures to present the results. In this document, we include two case studies to demonstrate the usefulness of Open RL Benchmark in practice. To the best of our knowledge, Open RL Benchmark is the first RL benchmark of its kind, and the authors hope that it will improve and facilitate the work of researchers in the field.
△ Less
Submitted 5 February, 2024;
originally announced February 2024.
-
KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
Authors:
Zirui Liu,
Jiayi Yuan,
Hongye **,
Shaochen Zhong,
Zhaozhuo Xu,
Vladimir Braverman,
Beidi Chen,
Xia Hu
Abstract:
Efficiently serving large language models (LLMs) requires batching many requests together to reduce the cost per request. Yet, the key-value (KV) cache, which stores attention keys and values to avoid re-computations, significantly increases memory demands and becomes the new bottleneck in speed and memory usage. This memory demand increases with larger batch sizes and longer context lengths. Addi…
▽ More
Efficiently serving large language models (LLMs) requires batching many requests together to reduce the cost per request. Yet, the key-value (KV) cache, which stores attention keys and values to avoid re-computations, significantly increases memory demands and becomes the new bottleneck in speed and memory usage. This memory demand increases with larger batch sizes and longer context lengths. Additionally, the inference speed is limited by the size of KV cache, as the GPU's SRAM must load the entire KV cache from the main GPU memory for each token generated, causing the computational core to be idle during this process. A straightforward and effective solution to reduce KV cache size is quantization, which decreases the total bytes taken by KV cache. However, there is a lack of in-depth studies that explore the element distribution of KV cache to understand the hardness and limitation of KV cache quantization. To fill the gap, we conducted a comprehensive study on the element distribution in KV cache of popular LLMs. Our findings indicate that the key cache should be quantized per-channel, i.e., group elements along the channel dimension and quantize them together. In contrast, the value cache should be quantized per-token. From this analysis, we developed a tuning-free 2bit KV cache quantization algorithm, named KIVI. With the hardware-friendly implementation, KIVI can enable Llama (Llama-2), Falcon, and Mistral models to maintain almost the same quality while using $\mathbf{2.6\times}$ less peak memory usage (including the model weight). This reduction in memory usage enables up to $\mathbf{4\times}$ larger batch size, bringing $\mathbf{2.35\times \sim 3.47\times}$ throughput on real LLM inference workload. The source code is available at https://github.com/jy-yuan/KIVI.
△ Less
Submitted 5 February, 2024;
originally announced February 2024.
-
Watt-level all polarization-maintaining femtosecond fiber laser source at 1100 nm for multicolor two-photon fluorescence excitation of fluorescent proteins
Authors:
Junpeng Wen,
Christian Pilger,
Wenlong Wang,
Raghu Erapaneedi,
Hao Xiu,
Yiheng Fan,
Xu Hu,
Thomas Huser,
Friedemann Kiefer,
Xiaoming Wei,
Zhongmin Yang
Abstract:
We demonstrate a compact watt-level all polarization-maintaining (PM) femtosecond fiber laser source at 1100 nm. The fiber laser source is seeded by an all PM fiber mode-locked laser employing a nonlinear amplifying loop mirror. The seed laser can generate stable pulses at a fundamental repetition rate of 40.71 MHz with a signal-to-noise rate of >100 dB and an integrated relative intensity noise o…
▽ More
We demonstrate a compact watt-level all polarization-maintaining (PM) femtosecond fiber laser source at 1100 nm. The fiber laser source is seeded by an all PM fiber mode-locked laser employing a nonlinear amplifying loop mirror. The seed laser can generate stable pulses at a fundamental repetition rate of 40.71 MHz with a signal-to-noise rate of >100 dB and an integrated relative intensity noise of only ~0.061%. After two-stage external amplification and pulse compression, an output power of ~1.47 W (corresponding to a pulse energy of ~36.1 nJ) and a pulse duration of ~251 fs are obtained. The 1100 nm femtosecond fiber laser is then employed as the excitation light source for multicolor multi-photon fluorescence microscopy of Chinese hamster ovary (CHO) cells stably expressing red fluorescent proteins.
△ Less
Submitted 3 February, 2024;
originally announced February 2024.
-
3DG: A Framework for Using Generative AI for Handling Sparse Learner Performance Data From Intelligent Tutoring Systems
Authors:
Liang Zhang,
Jionghao Lin,
Conrad Borchers,
Meng Cao,
Xiangen Hu
Abstract:
Learning performance data (e.g., quiz scores and attempts) is significant for understanding learner engagement and knowledge mastery level. However, the learning performance data collected from Intelligent Tutoring Systems (ITSs) often suffers from sparsity, impacting the accuracy of learner modeling and knowledge assessments. To address this, we introduce the 3DG framework (3-Dimensional tensor f…
▽ More
Learning performance data (e.g., quiz scores and attempts) is significant for understanding learner engagement and knowledge mastery level. However, the learning performance data collected from Intelligent Tutoring Systems (ITSs) often suffers from sparsity, impacting the accuracy of learner modeling and knowledge assessments. To address this, we introduce the 3DG framework (3-Dimensional tensor for Densification and Generation), a novel approach combining tensor factorization with advanced generative models, including Generative Adversarial Network (GAN) and Generative Pre-trained Transformer (GPT), for enhanced data imputation and augmentation. The framework operates by first representing the data as a three-dimensional tensor, capturing dimensions of learners, questions, and attempts. It then densifies the data through tensor factorization and augments it using Generative AI models, tailored to individual learning patterns identified via clustering. Applied to data from an AutoTutor lesson by the Center for the Study of Adult Literacy (CSAL), the 3DG framework effectively generated scalable, personalized simulations of learning performance. Comparative analysis revealed GAN's superior reliability over GPT-4 in this context, underscoring its potential in addressing data sparsity challenges in ITSs and contributing to the advancement of personalized educational technology.
△ Less
Submitted 29 January, 2024;
originally announced February 2024.
-
LLM-based NLG Evaluation: Current Status and Challenges
Authors:
Mingqi Gao,
Xinyu Hu,
Jie Ruan,
Xiao Pu,
Xiaojun Wan
Abstract:
Evaluating natural language generation (NLG) is a vital but challenging problem in artificial intelligence. Traditional evaluation metrics mainly capturing content (e.g. n-gram) overlap between system outputs and references are far from satisfactory, and large language models (LLMs) such as ChatGPT have demonstrated great potential in NLG evaluation in recent years. Various automatic evaluation me…
▽ More
Evaluating natural language generation (NLG) is a vital but challenging problem in artificial intelligence. Traditional evaluation metrics mainly capturing content (e.g. n-gram) overlap between system outputs and references are far from satisfactory, and large language models (LLMs) such as ChatGPT have demonstrated great potential in NLG evaluation in recent years. Various automatic evaluation methods based on LLMs have been proposed, including metrics derived from LLMs, prompting LLMs, and fine-tuning LLMs with labeled evaluation data. In this survey, we first give a taxonomy of LLM-based NLG evaluation methods, and discuss their pros and cons, respectively. We also discuss human-LLM collaboration for NLG evaluation. Lastly, we discuss several open problems in this area and point out future research directions.
△ Less
Submitted 26 February, 2024; v1 submitted 2 February, 2024;
originally announced February 2024.
-
Measurements of the branching fraction ratio $\cal{B}(φ\to μ^+μ^-)/\cal{B}(φ\to e^+e^-)$ with charm meson decays
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
A. Alfonso Albero,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1080 additional authors not shown)
Abstract:
Measurements of the branching fraction ratio ${\cal{B}(φ\to μ^+ μ^-)/\cal{B}(φ\to e^+e^-)}$ with ${D_{s}^{+} \to π^{+} φ}$ and ${D^{+} \to π^{+} φ}$ decays, denoted $R^{s}_{φπ}$ and $R^{d}_{φπ}$, are presented. The analysis is performed using a dataset corresponding to an integrated luminosity of 5.4$\,\rm{fb}^{-1}$ of $pp$ collision data collected with the LHCb experiment. The branching fractions…
▽ More
Measurements of the branching fraction ratio ${\cal{B}(φ\to μ^+ μ^-)/\cal{B}(φ\to e^+e^-)}$ with ${D_{s}^{+} \to π^{+} φ}$ and ${D^{+} \to π^{+} φ}$ decays, denoted $R^{s}_{φπ}$ and $R^{d}_{φπ}$, are presented. The analysis is performed using a dataset corresponding to an integrated luminosity of 5.4$\,\rm{fb}^{-1}$ of $pp$ collision data collected with the LHCb experiment. The branching fractions are normalised with respect to the ${B^{+} \to K^{+} J/ψ(\to e^+e^-)}$ and ${B^{+} \to K^{+} J/ψ(\to μ^+μ^-)}$ decay modes. The combination of the results yields $$ R_{φπ} = 1.022 \pm 0.012 \,({\rm stat}) \, \pm 0.048 \,({\rm syst}). $$ The result is compatible with previous measurements of the $φ\to \ell^{+}\ell^{-}$ branching fractions and predictions based on the Standard Model.
△ Less
Submitted 1 May, 2024; v1 submitted 2 February, 2024;
originally announced February 2024.
-
A generalized essentially non-hourglass total Lagrangian SPH solid dynamics
Authors:
Dong Wu,
Xiao**g Tang,
Shuaihao Zhang,
Xiangyu Hu
Abstract:
In this paper, we tackle a persistent numerical instability within the total Lagrangian smoothed particle hydrodynamics (TLSPH) solid dynamics. Specifically, we address the hourglass modes that may grow and eventually deteriorate the reliability of simulation, particularly in the scenarios characterized by large deformations. We propose a generalized essentially non-hourglass formulation based on…
▽ More
In this paper, we tackle a persistent numerical instability within the total Lagrangian smoothed particle hydrodynamics (TLSPH) solid dynamics. Specifically, we address the hourglass modes that may grow and eventually deteriorate the reliability of simulation, particularly in the scenarios characterized by large deformations. We propose a generalized essentially non-hourglass formulation based on volumetric-deviatoric stress decomposition, offering a general solution for elasticity, plasticity, anisotropy, and other material models. Comparing the standard SPH formulation with the original non-nested Laplacian operator applied in our previous work \cite{wu2023essentially} to handle the hourglass issues in standard elasticity, we introduce a correction for the discretization of shear stress that relies on the discrepancy produced by a tracing-back prediction of the initial inter-particle direction from the current deformation gradient. The present formulation, when applied to standard elastic materials, is able to recover the original Laplacian operator. Due to the dimensionless nature of the correction, this formulation handles complex material models in a very straightforward way. Furthermore, a magnitude limiter is introduced to minimize the correction in domains where the discrepancy is less pronounced. The present formulation is validated, with a single set of modeling parameters, through a series of benchmark cases, confirming good stability and accuracy across elastic, plastic, and anisotropic materials. To showcase its potential, the formulation is employed to simulate a complex problem involving viscous plastic Oobleck material, contacts, and very large deformation.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
Study of $CP$ violation in $B^0_{(s)} \to D K^{*}(892)^0$ decays with $D \to K π( ππ)$, $ ππ( ππ)$, and $KK$ final states
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
A. Alfonso Albero,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1072 additional authors not shown)
Abstract:
A measurement of $CP$-violating observables associated with the interference of $B^0\to D^0 K^{*}(892)^0$ and $B^0\to \bar{D}^0 K^*(892)^0$ decay amplitudes is performed in the $D^0 \to K^{\mp}π^{\pm}(π^+π^-),$ $D^0 \to π^+π^-(π^+π^-)$, and $D^0\to K^+K^-$ final states using data collected by the LHCb experiment corresponding to an integrated luminosity of $9$ $\text{fb}^{-1}$. $CP$-violating obse…
▽ More
A measurement of $CP$-violating observables associated with the interference of $B^0\to D^0 K^{*}(892)^0$ and $B^0\to \bar{D}^0 K^*(892)^0$ decay amplitudes is performed in the $D^0 \to K^{\mp}π^{\pm}(π^+π^-),$ $D^0 \to π^+π^-(π^+π^-)$, and $D^0\to K^+K^-$ final states using data collected by the LHCb experiment corresponding to an integrated luminosity of $9$ $\text{fb}^{-1}$. $CP$-violating observables related to the interference of $B^0_s\to D^0 \bar{K}^*(892)^0$ and $B_s^0\to \bar{D}^0 \bar{K}^*(892)^0$ are also measured, but no evidence for interference is found. The $B^0$ observables are used to constrain the parameter space of the CKM angle $γ$ and the hadronic parameters $r_{B^0}^{DK^*}$ and $δ_{B^0}^{DK^*}$ with inputs from other measurements. In a combined analysis, these measurements allow for four solutions in the parameter space, only one of which is consistent with the world average.
△ Less
Submitted 13 May, 2024; v1 submitted 31 January, 2024;
originally announced January 2024.
-
SAGD: Boundary-Enhanced Segment Anything in 3D Gaussian via Gaussian Decomposition
Authors:
Xu Hu,
Yuxi Wang,
Lue Fan,
Junsong Fan,
Junran Peng,
Zhen Lei,
Qing Li,
Zhaoxiang Zhang
Abstract:
3D Gaussian Splatting has emerged as an alternative 3D representation for novel view synthesis, benefiting from its high-quality rendering results and real-time rendering speed. However, the 3D Gaussians learned by 3D-GS have ambiguous structures without any geometry constraints. This inherent issue in 3D-GS leads to a rough boundary when segmenting individual objects. To remedy these problems, we…
▽ More
3D Gaussian Splatting has emerged as an alternative 3D representation for novel view synthesis, benefiting from its high-quality rendering results and real-time rendering speed. However, the 3D Gaussians learned by 3D-GS have ambiguous structures without any geometry constraints. This inherent issue in 3D-GS leads to a rough boundary when segmenting individual objects. To remedy these problems, we propose SAGD, a conceptually simple yet effective boundary-enhanced segmentation pipeline for 3D-GS to improve segmentation accuracy while preserving segmentation speed. Specifically, we introduce a Gaussian Decomposition scheme, which ingeniously utilizes the special structure of 3D Gaussian, finds out, and then decomposes the boundary Gaussians. Moreover, to achieve fast interactive 3D segmentation, we introduce a novel training-free pipeline by lifting a 2D foundation model to 3D-GS. Extensive experiments demonstrate that our approach achieves high-quality 3D segmentation without rough boundary issues, which can be easily applied to other scene editing tasks.
△ Less
Submitted 17 May, 2024; v1 submitted 31 January, 2024;
originally announced January 2024.
-
PALoc: Advancing SLAM Benchmarking with Prior-Assisted 6-DoF Trajectory Generation and Uncertainty Estimation
Authors:
Xiangcheng Hu,
Linwei Zheng,
** Wu,
Ruoyu Geng,
Yang Yu,
Hexiang Wei,
Xiaoyu Tang,
Lujia Wang,
Jianhao Jiao,
Ming Liu
Abstract:
Accurately generating ground truth (GT) trajectories is essential for Simultaneous Localization and Map** (SLAM) evaluation, particularly under varying environmental conditions. This study introduces a systematic approach employing a prior map-assisted framework for generating dense six-degree-of-freedom (6-DoF) GT poses for the first time, enhancing the fidelity of both indoor and outdoor SLAM…
▽ More
Accurately generating ground truth (GT) trajectories is essential for Simultaneous Localization and Map** (SLAM) evaluation, particularly under varying environmental conditions. This study introduces a systematic approach employing a prior map-assisted framework for generating dense six-degree-of-freedom (6-DoF) GT poses for the first time, enhancing the fidelity of both indoor and outdoor SLAM datasets. Our method excels in handling degenerate and stationary conditions frequently encountered in SLAM datasets, thereby increasing robustness and precision. A significant aspect of our approach is the detailed derivation of covariances within the factor graph, enabling an in-depth analysis of pose uncertainty propagation. This analysis crucially contributes to demonstrating specific pose uncertainties and enhancing trajectory reliability from both theoretical and empirical perspectives. Additionally, we provide an open-source toolbox (https://github.com/JokerJohn/Cloud_Map_Evaluation) for map evaluation criteria, facilitating the indirect assessment of overall trajectory precision. Experimental results show at least a 30\% improvement in map accuracy and a 20\% increase in direct trajectory accuracy compared to the Iterative Closest Point (ICP) \cite{sharp2002icp} algorithm across diverse campus environments, with substantially enhanced robustness. Our open-source solution (https://github.com/JokerJohn/PALoc), extensively applied in the FusionPortable\cite{Jiao2022Mar} dataset, is geared towards SLAM benchmark dataset augmentation and represents a significant advancement in SLAM evaluations.
△ Less
Submitted 6 February, 2024; v1 submitted 31 January, 2024;
originally announced January 2024.
-
Higher-order topology in honeycomb lattice with Y-Kekulé distortions
Authors:
Yong-Cheng Jiang,
Toshikaze Kariyado,
Xiao Hu
Abstract:
We investigate higher-order topological states in honeycomb lattice with Y-Kekulé distortions that preserve $C_{6v}$ crystalline symmetry. The gapped states in expanded and shrunken distortions are adiabatically connected to isolated hexamers and Y-shaped tetramer states, respectively, where the former possesses nontrivial higher-order topology characterized by a $\mathbb{Z}_6$ invariant. Topologi…
▽ More
We investigate higher-order topological states in honeycomb lattice with Y-Kekulé distortions that preserve $C_{6v}$ crystalline symmetry. The gapped states in expanded and shrunken distortions are adiabatically connected to isolated hexamers and Y-shaped tetramer states, respectively, where the former possesses nontrivial higher-order topology characterized by a $\mathbb{Z}_6$ invariant. Topological corner states exist in a flake structure with expanded distortion where the hexamers are broken at the corners. Our work reveals that honeycomb lattice with Y-Kekulé distortions serves as a promising platform to study higher-order topological states.
△ Less
Submitted 31 January, 2024;
originally announced January 2024.
-
Evolution of magnetic field of the Quasar 1604+159 at pc scale
Authors:
Xu-Zhi Hu,
Xiaoyu Hong,
Wei Zhao,
Liang Chen,
Wei-Yang Wang,
Linhui Wu
Abstract:
We have analyzed the total intensity, spectral index, linear polarization, and RM distributions at pc scale for the quasar 1604+159. The source was observed in 2002 and 2020 with the VLBA. Combining the MOJAVE results, we studied the evolution of the magnetic field. We detected a core-jet structure. The jet extends to a distance of ~25 mas. The jet shape varies slightly with time. We divided the s…
▽ More
We have analyzed the total intensity, spectral index, linear polarization, and RM distributions at pc scale for the quasar 1604+159. The source was observed in 2002 and 2020 with the VLBA. Combining the MOJAVE results, we studied the evolution of the magnetic field. We detected a core-jet structure. The jet extends to a distance of ~25 mas. The jet shape varies slightly with time. We divided the source structure into the central region and the jet region. In the jet region, we find the polarized emission varies with time. The flatter spectral index values and EVPA direction indicate the possible existence of shocks, contributing to the variation. In the central region, the derived core shift index k_r values indicate that the core in 2002 is close to the equipartition case while deviating from it in 2020. The measured magnetic field strength in 2020 is two orders of magnitude lower than that in 2002. We detected transverse RM gradients, evidence of a helical magnetic field, in the core. At 15 GHz, in the place close to the jet base, the polarization direction changes significantly with time from perpendicular to parallel to the jet direction. The evolution of RM and magnetic field structure are potential reasons for the observed polarization change. The core |RM| in 2020 increases with frequency following a power law with index a = 2.7, suggesting a fast electron density fall-off in the medium with distance from the jet base.
△ Less
Submitted 1 February, 2024; v1 submitted 30 January, 2024;
originally announced January 2024.
-
YingLong: Skillful High Resolution Regional Short Term Forecasting with Boundary Smoothing
Authors:
Pengbo Xu,
Tianyan Gao,
Yu Wang,
Jun** Yin,
Juan Zhang,
Xiaogu Zheng,
Zhimin Zhang,
Xiaoguang Hu,
Xiaoxu Chen
Abstract:
In the realm of numerical weather forecasting, achieving higher resolution demands increased computational resources and time investment, and leveraging deep learning networks trained solely on data significantly reduces the time expenditure during forecasting. Recently, several global forecasting artificial-intelligence-based models are developed, which are mainly trained on reanalysis dataset wi…
▽ More
In the realm of numerical weather forecasting, achieving higher resolution demands increased computational resources and time investment, and leveraging deep learning networks trained solely on data significantly reduces the time expenditure during forecasting. Recently, several global forecasting artificial-intelligence-based models are developed, which are mainly trained on reanalysis dataset with a spatial resolution of approximately 25km. However, regional forecasting prefers a higher spatial resolution, and boundary information for the region also plays an important role in regional forecasting, which turns out to be a major difference from global forecasting. Here we introduce a high resolution, short-term regional weather forecasting, artificial-intelligence-based model called 'YingLong', which is capable of hourly predicting weather fields including wind speed, temperature, and specific humidity at a 3km resolution. YingLong utilizes a parallel structure of global and local blocks to capture multiscale meteorological features and is trained on analysis dataset. Additionally, the necessary information around the regional boundary is introduced to YingLong through the boundary smoothing strategy, which significantly improves the regional forecasting results. By comparing forecast results with those from WRF-ARW, one of the best numerical prediction models, YingLong demonstrates superior forecasting performances in most cases, especially on surface variables.
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
E-EVAL: A Comprehensive Chinese K-12 Education Evaluation Benchmark for Large Language Models
Authors:
**chang Hou,
Chang Ao,
Haihong Wu,
Xiangtao Kong,
Zhigang Zheng,
Daijia Tang,
Chengming Li,
Xi** Hu,
Ruifeng Xu,
Shiwen Ni,
Min Yang
Abstract:
With the accelerating development of Large Language Models (LLMs), many LLMs are beginning to be used in the Chinese K-12 education domain. The integration of LLMs and education is getting closer and closer, however, there is currently no benchmark for evaluating LLMs that focuses on the Chinese K-12 education domain. Therefore, there is an urgent need for a comprehensive natural language processi…
▽ More
With the accelerating development of Large Language Models (LLMs), many LLMs are beginning to be used in the Chinese K-12 education domain. The integration of LLMs and education is getting closer and closer, however, there is currently no benchmark for evaluating LLMs that focuses on the Chinese K-12 education domain. Therefore, there is an urgent need for a comprehensive natural language processing benchmark to accurately assess the capabilities of various LLMs in the Chinese K-12 education domain. To address this, we introduce the E-EVAL, the first comprehensive evaluation benchmark specifically designed for the Chinese K-12 education field. The E-EVAL consists of 4,351 multiple-choice questions at the primary, middle, and high school levels across a wide range of subjects, including Chinese, English, Politics, History, Ethics, Physics, Chemistry, Mathematics, and Geography. We conducted a comprehensive evaluation of E-EVAL on advanced LLMs, including both English-dominant and Chinese-dominant models. Findings show that Chinese-dominant models perform well compared to English-dominant models, with many scoring even above the GPT 4.0. However, almost all models perform poorly in complex subjects such as mathematics. We also found that most Chinese-dominant LLMs did not achieve higher scores at the primary school level compared to the middle school level. We observe that the mastery of higher-order knowledge by the model does not necessarily imply the mastery of lower-order knowledge as well. Additionally, the experimental results indicate that the Chain of Thought (CoT) technique is effective only for the challenging science subjects, while Few-shot prompting is more beneficial for liberal arts subjects. With E-EVAL, we aim to analyze the strengths and limitations of LLMs in educational applications, and to contribute to the progress and development of Chinese K-12 education and LLMs.
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
The Clock Distribution System for the ATLAS Liquid Argon Calorimeter Phase-I Upgrade Demonstrator
Authors:
Binwei Deng,
Hucheng Chen,
Kai Chen,
**ghong Chen,
Datao Gong,
Di Guo,
Xueye Hu,
De** Huang,
James Kierstead,
Xiaoting Li,
Chonghan Liu,
Tiankuan Liu,
Annie C. Xiang,
Hao Xu,
Tongye Xu,
Yang You,
**gbo Ye
Abstract:
A prototype Liquid-argon Trigger Digitizer Board (LTDB), called the LTDB Demonstrator, has been developed to demonstrate the functions of the ATLAS Liquid Argon Calorimeter Phase-I trigger electronics upgrade. Forty Analog-to-Digital converters and four FPGAs with embedded multi-gigabit-transceivers on each Demonstrator need high quality clocks. A clock distribution system based on commercial comp…
▽ More
A prototype Liquid-argon Trigger Digitizer Board (LTDB), called the LTDB Demonstrator, has been developed to demonstrate the functions of the ATLAS Liquid Argon Calorimeter Phase-I trigger electronics upgrade. Forty Analog-to-Digital converters and four FPGAs with embedded multi-gigabit-transceivers on each Demonstrator need high quality clocks. A clock distribution system based on commercial components has been developed for the Demonstrator. The design of the clock distribution system is presented. The performance of the clock distribution system has been evaluated. The components used in the clock distribution system have been qualified to meet radiation tolerance requirements of the Demonstrator.
△ Less
Submitted 28 January, 2024;
originally announced January 2024.
-
TDFNet: An Efficient Audio-Visual Speech Separation Model with Top-down Fusion
Authors:
Samuel Pegg,
Kai Li,
Xiaolin Hu
Abstract:
Audio-visual speech separation has gained significant traction in recent years due to its potential applications in various fields such as speech recognition, diarization, scene analysis and assistive technologies. Designing a lightweight audio-visual speech separation network is important for low-latency applications, but existing methods often require higher computational costs and more paramete…
▽ More
Audio-visual speech separation has gained significant traction in recent years due to its potential applications in various fields such as speech recognition, diarization, scene analysis and assistive technologies. Designing a lightweight audio-visual speech separation network is important for low-latency applications, but existing methods often require higher computational costs and more parameters to achieve better separation performance. In this paper, we present an audio-visual speech separation model called Top-Down-Fusion Net (TDFNet), a state-of-the-art (SOTA) model for audio-visual speech separation, which builds upon the architecture of TDANet, an audio-only speech separation method. TDANet serves as the architectural foundation for the auditory and visual networks within TDFNet, offering an efficient model with fewer parameters. On the LRS2-2Mix dataset, TDFNet achieves a performance increase of up to 10\% across all performance metrics compared with the previous SOTA method CTCNet. Remarkably, these results are achieved using fewer parameters and only 28\% of the multiply-accumulate operations (MACs) of CTCNet. In essence, our method presents a highly effective and efficient solution to the challenges of speech separation within the audio-visual domain, making significant strides in harnessing visual information optimally.
△ Less
Submitted 25 January, 2024;
originally announced January 2024.
-
The Calibration Gap between Model and Human Confidence in Large Language Models
Authors:
Mark Steyvers,
Heliodoro Tejeda,
Aakriti Kumar,
Catarina Belem,
Sheer Karny,
Xinyue Hu,
Lukas Mayer,
Padhraic Smyth
Abstract:
For large language models (LLMs) to be trusted by humans they need to be well-calibrated in the sense that they can accurately assess and communicate how likely it is that their predictions are correct. Recent work has focused on the quality of internal LLM confidence assessments, but the question remains of how well LLMs can communicate this internal model confidence to human users. This paper ex…
▽ More
For large language models (LLMs) to be trusted by humans they need to be well-calibrated in the sense that they can accurately assess and communicate how likely it is that their predictions are correct. Recent work has focused on the quality of internal LLM confidence assessments, but the question remains of how well LLMs can communicate this internal model confidence to human users. This paper explores the disparity between external human confidence in an LLM's responses and the internal confidence of the model. Through experiments involving multiple-choice questions, we systematically examine human users' ability to discern the reliability of LLM outputs. Our study focuses on two key areas: (1) assessing users' perception of true LLM confidence and (2) investigating the impact of tailored explanations on this perception. The research highlights that default explanations from LLMs often lead to user overestimation of both the model's confidence and its' accuracy. By modifying the explanations to more accurately reflect the LLM's internal confidence, we observe a significant shift in user perception, aligning it more closely with the model's actual confidence levels. This adjustment in explanatory approach demonstrates potential for enhancing user trust and accuracy in assessing LLM outputs. The findings underscore the importance of transparent communication of confidence levels in LLMs, particularly in high-stakes applications where understanding the reliability of AI-generated information is essential.
△ Less
Submitted 24 January, 2024;
originally announced January 2024.
-
Evaluation of General Large Language Models in Contextually Assessing Semantic Concepts Extracted from Adult Critical Care Electronic Health Record Notes
Authors:
Darren Liu,
Cheng Ding,
Delgersuren Bold,
Monique Bouvier,
Jiaying Lu,
Benjamin Shickel,
Craig S. Jabaley,
Wenhui Zhang,
Soo** Park,
Michael J. Young,
Mark S. Wainwright,
Gilles Clermont,
Parisa Rashidi,
Eric S. Rosenthal,
Laurie Dimisko,
Ran Xiao,
Joo Heung Yoon,
Carl Yang,
Xiao Hu
Abstract:
The field of healthcare has increasingly turned its focus towards Large Language Models (LLMs) due to their remarkable performance. However, their performance in actual clinical applications has been underexplored. Traditional evaluations based on question-answering tasks don't fully capture the nuanced contexts. This gap highlights the need for more in-depth and practical assessments of LLMs in r…
▽ More
The field of healthcare has increasingly turned its focus towards Large Language Models (LLMs) due to their remarkable performance. However, their performance in actual clinical applications has been underexplored. Traditional evaluations based on question-answering tasks don't fully capture the nuanced contexts. This gap highlights the need for more in-depth and practical assessments of LLMs in real-world healthcare settings. Objective: We sought to evaluate the performance of LLMs in the complex clinical context of adult critical care medicine using systematic and comprehensible analytic methods, including clinician annotation and adjudication. Methods: We investigated the performance of three general LLMs in understanding and processing real-world clinical notes. Concepts from 150 clinical notes were identified by MetaMap and then labeled by 9 clinicians. Each LLM's proficiency was evaluated by identifying the temporality and negation of these concepts using different prompts for an in-depth analysis. Results: GPT-4 showed overall superior performance compared to other LLMs. In contrast, both GPT-3.5 and text-davinci-003 exhibit enhanced performance when the appropriate prompting strategies are employed. The GPT family models have demonstrated considerable efficiency, evidenced by their cost-effectiveness and time-saving capabilities. Conclusion: A comprehensive qualitative performance evaluation framework for LLMs is developed and operationalized. This framework goes beyond singular performance aspects. With expert annotations, this methodology not only validates LLMs' capabilities in processing complex medical data but also establishes a benchmark for future LLM evaluations across specialized domains.
△ Less
Submitted 24 January, 2024;
originally announced January 2024.
-
Boundary and Relation Distillation for Semantic Segmentation
Authors:
Dong Zhang,
**cheng Dong,
Xinting Hu,
Long Chen,
Kwang-Ting Cheng
Abstract:
Recently, it has been revealed that small semantic segmentation (SS) models exhibit a tendency to make errors in maintaining boundary region completeness and preserving target region connectivity, despite their effective segmentation of the main object regions. To address these errors, we propose a targeted boundary and relation distillation (BRD) strategy using knowledge distillation from large t…
▽ More
Recently, it has been revealed that small semantic segmentation (SS) models exhibit a tendency to make errors in maintaining boundary region completeness and preserving target region connectivity, despite their effective segmentation of the main object regions. To address these errors, we propose a targeted boundary and relation distillation (BRD) strategy using knowledge distillation from large teacher models to small student models. Specifically, the boundary distillation extracts explicit object boundaries from the hierarchical feature maps of the backbone network, subsequently enhancing the student model's mask quality in boundary regions. Concurrently, the relation distillation transfers implicit relations from the teacher model to the student model using pixel-level self-relation as a bridge, ensuring that the student's mask has strong target region connectivity. The proposed BRD is designed concretely for SS and is characterized by simplicity and efficiency. Through experimental evaluations on multiple SS datasets, including Pascal VOC 2012, Cityscapes, ADE20K, and COCO-Stuff 10K, we demonstrated that BRD significantly surpasses the current methods without increasing the inference costs, generating crisp region boundaries and smooth connecting regions that are challenging for small models.
△ Less
Submitted 23 January, 2024;
originally announced January 2024.
-
Assessing and Understanding Creativity in Large Language Models
Authors:
Yunpu Zhao,
Rui Zhang,
Wenyi Li,
Di Huang,
Jiaming Guo,
Shaohui Peng,
Yifan Hao,
Yuanbo Wen,
Xing Hu,
Zidong Du,
Qi Guo,
Ling Li,
Yunji Chen
Abstract:
In the field of natural language processing, the rapid development of large language model (LLM) has attracted more and more attention. LLMs have shown a high level of creativity in various tasks, but the methods for assessing such creativity are inadequate. The assessment of LLM creativity needs to consider differences from humans, requiring multi-dimensional measurement while balancing accuracy…
▽ More
In the field of natural language processing, the rapid development of large language model (LLM) has attracted more and more attention. LLMs have shown a high level of creativity in various tasks, but the methods for assessing such creativity are inadequate. The assessment of LLM creativity needs to consider differences from humans, requiring multi-dimensional measurement while balancing accuracy and efficiency. This paper aims to establish an efficient framework for assessing the level of creativity in LLMs. By adapting the modified Torrance Tests of Creative Thinking, the research evaluates the creative performance of various LLMs across 7 tasks, emphasizing 4 criteria including Fluency, Flexibility, Originality, and Elaboration. In this context, we develop a comprehensive dataset of 700 questions for testing and an LLM-based evaluation method. In addition, this study presents a novel analysis of LLMs' responses to diverse prompts and role-play situations. We found that the creativity of LLMs primarily falls short in originality, while excelling in elaboration. Besides, the use of prompts and the role-play settings of the model significantly influence creativity. Additionally, the experimental results also indicate that collaboration among multiple LLMs can enhance originality. Notably, our findings reveal a consensus between human evaluations and LLMs regarding the personality traits that influence creativity. The findings underscore the significant impact of LLM design on creativity and bridges artificial intelligence and human creativity, offering insights into LLMs' creativity and potential applications.
△ Less
Submitted 23 January, 2024;
originally announced January 2024.
-
Positivstellensätze and Moment problems with Universal Quantifiers
Authors:
Xiaomeng Hu,
Igor Klep,
Jiawang Nie
Abstract:
This paper studies Positivstellensätze and moment problems for sets that are given by universal quantifiers. Let $Q$ be a closed set and let $g = (g_1,...,g_s)$ be a tuple of polynomials in two vector variables $x$ and $y$. Then $K$ is described as the set of all points $x$ such that each $g_j(x, y) \ge 0$ for all $y \in Q$. Fix a measure $ν$ with $supp(ν) = Q$, and assume it satisfies the Carlema…
▽ More
This paper studies Positivstellensätze and moment problems for sets that are given by universal quantifiers. Let $Q$ be a closed set and let $g = (g_1,...,g_s)$ be a tuple of polynomials in two vector variables $x$ and $y$. Then $K$ is described as the set of all points $x$ such that each $g_j(x, y) \ge 0$ for all $y \in Q$. Fix a measure $ν$ with $supp(ν) = Q$, and assume it satisfies the Carleman condition.
The first main result of the paper is a Positivstellensatz with universal quantifiers: if a polynomial $f(x)$ is positive on $K$, then it belongs to the quadratic module $QM(g,ν)$ associated to $(g,ν)$, under the archimedeanness assumption on $QM(g,ν)$. Here, $QM(g,ν)$ denotes the quadratic module of polynomials in $x$ that can be represented as \[τ_0(x) + \int τ_1(x,y)g_1(x, y)\, dν(y) + \cdots + \int τ_s(x,y) g_s(x, y)\, dν(y), \] where each $τ_j$ is a sum of squares polynomial.
Second, necessary and sufficient conditions for a full (or truncated) multisequence to admit a representing measure supported in $K$ are given. In particular, the classical flat extension theorem of Curto and Fialkow is generalized to truncated moment problems on such a set $K$. Finally, applications of these results for solving semi-infinite optimization problems are presented.
△ Less
Submitted 22 January, 2024;
originally announced January 2024.
-
Prompt and nonprompt $ψ(2S)$ production in $p$Pb collisions at $\sqrt{s_{NN}}=8.16$ TeV
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
B. Adeva,
M. Adinolfi,
P. Adlarson,
H. Afsharnia,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
A. Alfonso Albero,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey
, et al. (1079 additional authors not shown)
Abstract:
The production of $ψ(2S)$ mesons in proton-lead collisions at a centre-of-mass energy per nucleon pair of $\sqrt{s_{NN}}=8.16$ TeV is studied with the LHCb detector using data corresponding to an integrated luminosity of 34 nb$^{-1}$. The prompt and nonprompt $ψ(2S)$ production cross-sections and the ratio of the $ψ(2S)$ to $J/ψ$ cross-section are measured as a function of the meson transverse mom…
▽ More
The production of $ψ(2S)$ mesons in proton-lead collisions at a centre-of-mass energy per nucleon pair of $\sqrt{s_{NN}}=8.16$ TeV is studied with the LHCb detector using data corresponding to an integrated luminosity of 34 nb$^{-1}$. The prompt and nonprompt $ψ(2S)$ production cross-sections and the ratio of the $ψ(2S)$ to $J/ψ$ cross-section are measured as a function of the meson transverse momentum and rapidity in the nucleon-nucleon centre-of-mass frame, together with forward-to-backward ratios and nuclear modification factors. The production of prompt $ψ(2S)$ is observed to be more suppressed compared to $pp$ collisions than the prompt $J/ψ$ production, while the nonprompt productions have similar suppression factors.
△ Less
Submitted 22 April, 2024; v1 submitted 20 January, 2024;
originally announced January 2024.
-
Multidimensional nonhomogeneous quasi-linear systems and their Hamiltonian structures
Authors:
Xin Hu,
Matteo Casati
Abstract:
In this paper we investigate multidimensional first-order quasi-linear systems and find necessary conditions for them to admit Hamiltonian formulation. The insufficiency of the conditions is related to the Poisson cohomology of the admissible Hamiltonian operators. We present in detail the example of two-dimensional, two-components systems of hydrodynamic type and of a real reduction of the 3-wave…
▽ More
In this paper we investigate multidimensional first-order quasi-linear systems and find necessary conditions for them to admit Hamiltonian formulation. The insufficiency of the conditions is related to the Poisson cohomology of the admissible Hamiltonian operators. We present in detail the example of two-dimensional, two-components systems of hydrodynamic type and of a real reduction of the 3-waves system.
△ Less
Submitted 1 April, 2024; v1 submitted 18 January, 2024;
originally announced January 2024.
-
Diurnal ejection of boulder clusters on comet 67P lasting beyond 3 AU
Authors:
Xian Shi,
Xuanyu Hu,
Jessica Agarwal,
Carsten Güttler,
Martin Rose,
Horst Uwe Keller,
Marco Fulle,
Jakob Deller,
Holger Sierks
Abstract:
Ejection of large boulder-like debris is a vigorous form of cometary activity that is unlikely induced by water ice out-gassing alone but rather associated with the sublimation of super-volatile ices. Though perceived on several comets, actual pattern and mechanism of such activity are still unclear. Here we report on a specialized observation of ejections of decimeter- to meter-sized boulders on…
▽ More
Ejection of large boulder-like debris is a vigorous form of cometary activity that is unlikely induced by water ice out-gassing alone but rather associated with the sublimation of super-volatile ices. Though perceived on several comets, actual pattern and mechanism of such activity are still unclear. Here we report on a specialized observation of ejections of decimeter- to meter-sized boulders on comet 67P/Churyumov-Gerasimenko outbound between 2.5 and 3.3 AU from the Sun. With a common source region, these events recurred in local morning. The boulders of elongated shapes were ejected in clusters at low inclinations comparable to the solar elevation below 40 degrees at the time. We show that these chunks could be propelled by the surrounding, asymmetric gas field that produced a distinct lateral acceleration. Possibly both water and carbon dioxide have contributed to their mobilization, while the season and local topography are among deciding factors. The mechanisms for sustaining regular activity of comets at large heliocentric distances are likely more diverse and intricate than previously thought.
△ Less
Submitted 18 January, 2024;
originally announced January 2024.
-
Enhanced Automated Quality Assessment Network for Interactive Building Segmentation in High-Resolution Remote Sensing Imagery
Authors:
Zhili Zhang,
Xiangyun Hu,
Jiabo Xu
Abstract:
In this research, we introduce the enhanced automated quality assessment network (IBS-AQSNet), an innovative solution for assessing the quality of interactive building segmentation within high-resolution remote sensing imagery. This is a new challenge in segmentation quality assessment, and our proposed IBS-AQSNet allievate this by identifying missed and mistaken segment areas. First of all, to ac…
▽ More
In this research, we introduce the enhanced automated quality assessment network (IBS-AQSNet), an innovative solution for assessing the quality of interactive building segmentation within high-resolution remote sensing imagery. This is a new challenge in segmentation quality assessment, and our proposed IBS-AQSNet allievate this by identifying missed and mistaken segment areas. First of all, to acquire robust image features, our method combines a robust, pre-trained backbone with a lightweight counterpart for comprehensive feature extraction from imagery and segmentation results. These features are then fused through a simple combination of concatenation, convolution layers, and residual connections. Additionally, ISR-AQSNet incorporates a multi-scale differential quality assessment decoder, proficient in pinpointing areas where segmentation result is either missed or mistaken. Experiments on a newly-built EVLab-BGZ dataset, which includes over 39,198 buildings, demonstrate the superiority of the proposed method in automating segmentation quality assessment, thereby setting a new benchmark in the field.
△ Less
Submitted 18 January, 2024;
originally announced January 2024.
-
Technical Report: On the Convergence of Gossip Learning in the Presence of Node Inaccessibility
Authors:
Tian Liu,
Yue Cui,
Xueyang Hu,
Yecheng Xu,
Bo Liu
Abstract:
Gossip learning (GL), as a decentralized alternative to federated learning (FL), is more suitable for resource-constrained wireless networks, such as Flying Ad-Hoc Networks (FANETs) that are formed by unmanned aerial vehicles (UAVs). GL can significantly enhance the efficiency and extend the battery life of UAV networks. Despite the advantages, the performance of GL is strongly affected by data di…
▽ More
Gossip learning (GL), as a decentralized alternative to federated learning (FL), is more suitable for resource-constrained wireless networks, such as Flying Ad-Hoc Networks (FANETs) that are formed by unmanned aerial vehicles (UAVs). GL can significantly enhance the efficiency and extend the battery life of UAV networks. Despite the advantages, the performance of GL is strongly affected by data distribution, communication speed, and network connectivity. However, how these factors influence the GL convergence is still unclear. Existing work studied the convergence of GL based on a virtual quantity for the sake of convenience, which failed to reflect the real state of the network when some nodes are inaccessible. In this paper, we formulate and investigate the impact of inaccessible nodes to GL under a dynamic network topology. We first decompose the weight divergence by whether the node is accessible or not. Then, we investigate the GL convergence under the dynamic of node accessibility and theoretically provide how the number of inaccessible nodes, data non-i.i.d.-ness, and duration of inaccessibility affect the convergence. Extensive experiments are carried out in practical settings to comprehensively verify the correctness of our theoretical findings.
△ Less
Submitted 18 February, 2024; v1 submitted 17 January, 2024;
originally announced January 2024.