-
Localization-Delocalization Transitions in Non-Hermitian Aharonov-Bohm Cages
Authors:
Xiang Li,
** Liu,
Tao Liu
Abstract:
A unique feature of non-Hermitian systems is the extreme sensitivity of the eigenspectrum to boundary conditions with the emergence of the non-Hermitian skin effect (NHSE). A NHSE originates from the point-gap topology of complex eigenspectrum, where an extensive number of eigenstates are anomalously localized at the boundary driven by nonreciprocal dissipation. Two different approaches to create…
▽ More
A unique feature of non-Hermitian systems is the extreme sensitivity of the eigenspectrum to boundary conditions with the emergence of the non-Hermitian skin effect (NHSE). A NHSE originates from the point-gap topology of complex eigenspectrum, where an extensive number of eigenstates are anomalously localized at the boundary driven by nonreciprocal dissipation. Two different approaches to create localization are disorder and flat-band spectrum, and their interplay can lead to the anomalous inverse Anderson localization, where the Bernoulli anti-symmetric disorder induce mobility in a full-flat band system in the presence of Aharonov-Bohm (AB) Cage. In this work, we study the localization-delocalization transitions due to the interplay of the point-gap topology, flat band and correlated disorder in the one-dimensional rhombic lattice, where both its Hermitian and non-Hermitian structures show AB cage in the presence of magnetic flux. Although it remains the coexistence of localization and delocalization for the Hermitian rhombic lattice in the presence of the random anti-symmetric disorder, it surprisingly becomes complete delocalization, accompanied by the emergence of NHSE. To further study the effects from the Bernoulli anti-symmetric disorder, we found the similar NHSE due to the interplay of the point-gap topology, correlated disorder and flat bands. Our anomalous localization-delocalization property can be experimentally tested in the classical physical platform, such as electrical circuit.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
A Unified Model for Spatio-Temporal Prediction Queries with Arbitrary Modifiable Areal Units
Authors:
Liyue Chen,
Jiangyi Fang,
Tengfei Liu,
Shaosheng Cao,
Leye Wang
Abstract:
Spatio-Temporal (ST) prediction is crucial for making informed decisions in urban location-based applications like ride-sharing. However, existing ST models often require region partition as a prerequisite, resulting in two main pitfalls. Firstly, location-based services necessitate ad-hoc regions for various purposes, requiring multiple ST models with varying scales and zones, which can be costly…
▽ More
Spatio-Temporal (ST) prediction is crucial for making informed decisions in urban location-based applications like ride-sharing. However, existing ST models often require region partition as a prerequisite, resulting in two main pitfalls. Firstly, location-based services necessitate ad-hoc regions for various purposes, requiring multiple ST models with varying scales and zones, which can be costly to support. Secondly, different ST models may produce conflicting outputs, resulting in confusing predictions. In this paper, we propose One4All-ST, a framework that can conduct ST prediction for arbitrary modifiable areal units using only one model. To reduce the cost of getting multi-scale predictions, we design an ST network with hierarchical spatial modeling and scale normalization modules to efficiently and equally learn multi-scale representations. To address prediction inconsistencies across scales, we propose a dynamic programming scheme to solve the formulated optimal combination problem, minimizing predicted error through theoretical analysis. Besides, we suggest using an extended quad-tree to index the optimal combinations for quick response to arbitrary modifiable areal units in practical online scenarios. Extensive experiments on two real-world datasets verify the efficiency and effectiveness of One4All-ST in ST prediction for arbitrary modifiable areal units. The source codes and data of this work are available at https://github.com/uctb/One4All-ST.
△ Less
Submitted 9 March, 2024;
originally announced March 2024.
-
Conditional Score-Based Diffusion Model for Cortical Thickness Trajectory Prediction
Authors:
Qing Xiao,
Siyeop Yoon,
Hui Ren,
Matthew Tivnan,
Lichao Sun,
Quanzheng Li,
Tianming Liu,
Yu Zhang,
Xiang Li
Abstract:
Alzheimer's Disease (AD) is a neurodegenerative condition characterized by diverse progression rates among individuals, with changes in cortical thickness (CTh) closely linked to its progression. Accurately forecasting CTh trajectories can significantly enhance early diagnosis and intervention strategies, providing timely care. However, the longitudinal data essential for these studies often suffe…
▽ More
Alzheimer's Disease (AD) is a neurodegenerative condition characterized by diverse progression rates among individuals, with changes in cortical thickness (CTh) closely linked to its progression. Accurately forecasting CTh trajectories can significantly enhance early diagnosis and intervention strategies, providing timely care. However, the longitudinal data essential for these studies often suffer from temporal sparsity and incompleteness, presenting substantial challenges in modeling the disease's progression accurately. Existing methods are limited, focusing primarily on datasets without missing entries or requiring predefined assumptions about CTh progression. To overcome these obstacles, we propose a conditional score-based diffusion model specifically designed to generate CTh trajectories with the given baseline information, such as age, sex, and initial diagnosis. Our conditional diffusion model utilizes all available data during the training phase to make predictions based solely on baseline information during inference without needing prior history about CTh progression. The prediction accuracy of the proposed CTh prediction pipeline using a conditional score-based model was compared for sub-groups consisting of cognitively normal, mild cognitive impairment, and AD subjects. The Bland-Altman analysis shows our diffusion-based prediction model has a near-zero bias with narrow 95% confidential interval compared to the ground-truth CTh in 6-36 months. In addition, our conditional diffusion model has a stochastic generative nature, therefore, we demonstrated an uncertainty analysis of patient-specific CTh prediction through multiple realizations.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
Determination of the number of $ψ(3686)$ events taken at BESIII
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (639 additional authors not shown)
Abstract:
The number of $ψ(3686)$ events collected by the BESIII detector during the 2021 run period is determined to be $(2259.3\pm 11.1)\times 10^6$ by counting inclusive $ψ(3686)$ hadronic events. The uncertainty is systematic and the statistical uncertainty is negligible. Meanwhile, the numbers of $ψ(3686)$ events collected during the 2009 and 2012 run periods are updated to be…
▽ More
The number of $ψ(3686)$ events collected by the BESIII detector during the 2021 run period is determined to be $(2259.3\pm 11.1)\times 10^6$ by counting inclusive $ψ(3686)$ hadronic events. The uncertainty is systematic and the statistical uncertainty is negligible. Meanwhile, the numbers of $ψ(3686)$ events collected during the 2009 and 2012 run periods are updated to be $(107.7\pm0.6)\times 10^6$ and $(345.4\pm 2.6)\times 10^6$, respectively. Both numbers are consistent with the previous measurements within one standard deviation. The total number of $ψ(3686)$ events in the three data samples is $(2712.4\pm14.3)\times10^6$.
△ Less
Submitted 28 May, 2024; v1 submitted 11 March, 2024;
originally announced March 2024.
-
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
Authors:
Liang Chen,
Haozhe Zhao,
Tianyu Liu,
Shuai Bai,
Junyang Lin,
Chang Zhou,
Baobao Chang
Abstract:
In this study, we identify the inefficient attention phenomena in Large Vision-Language Models (LVLMs), notably within prominent models like LLaVA-1.5, QwenVL-Chat and Video-LLaVA. We find out that the attention computation over visual tokens is of extreme inefficiency in the deep layers of popular LVLMs, suggesting a need for a sparser approach compared to textual data handling. To this end, we i…
▽ More
In this study, we identify the inefficient attention phenomena in Large Vision-Language Models (LVLMs), notably within prominent models like LLaVA-1.5, QwenVL-Chat and Video-LLaVA. We find out that the attention computation over visual tokens is of extreme inefficiency in the deep layers of popular LVLMs, suggesting a need for a sparser approach compared to textual data handling. To this end, we introduce FastV, a versatile plug-and-play method designed to optimize computational efficiency by learning adaptive attention patterns in early layers and pruning visual tokens in subsequent ones. Our evaluations demonstrate FastV's ability to dramatically reduce computational costs (e.g., a 45 reduction in FLOPs for LLaVA-1.5-13B) without sacrificing performance in a wide range of image and video understanding tasks. The computational efficiency and performance trade-off of FastV are highly customizable and pareto-efficient. It can compress the FLOPs of a 13B-parameter model to achieve a lower budget than that of a 7B-parameter model, while still maintaining superior performance. We believe FastV has practical values for deployment of LVLMs in edge devices and commercial models. Code is released at https://github.com/pkunlp-icler/FastV.
△ Less
Submitted 25 March, 2024; v1 submitted 11 March, 2024;
originally announced March 2024.
-
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Authors:
Gemini Team,
Petko Georgiev,
Ving Ian Lei,
Ryan Burnell,
Libin Bai,
Anmol Gulati,
Garrett Tanzer,
Damien Vincent,
Zhufeng Pan,
Shibo Wang,
Soroosh Mariooryad,
Yifan Ding,
Xinyang Geng,
Fred Alcober,
Roy Frostig,
Mark Omernick,
Lexi Walker,
Cosmin Paduraru,
Christina Sorokin,
Andrea Tacchetti,
Colin Gaffney,
Samira Daruki,
Olcan Sercinoglu,
Zach Gleicher,
Juliette Love
, et al. (1092 additional authors not shown)
Abstract:
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February…
▽ More
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.
△ Less
Submitted 14 June, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
HMD-Poser: On-Device Real-time Human Motion Tracking from Scalable Sparse Observations
Authors:
Peng Dai,
Yang Zhang,
Tao Liu,
Zhen Fan,
Tianyuan Du,
Zhuo Su,
Xiaozheng Zheng,
Zeming Li
Abstract:
It is especially challenging to achieve real-time human motion tracking on a standalone VR Head-Mounted Display (HMD) such as Meta Quest and PICO. In this paper, we propose HMD-Poser, the first unified approach to recover full-body motions using scalable sparse observations from HMD and body-worn IMUs. In particular, it can support a variety of input scenarios, such as HMD, HMD+2IMUs, HMD+3IMUs, e…
▽ More
It is especially challenging to achieve real-time human motion tracking on a standalone VR Head-Mounted Display (HMD) such as Meta Quest and PICO. In this paper, we propose HMD-Poser, the first unified approach to recover full-body motions using scalable sparse observations from HMD and body-worn IMUs. In particular, it can support a variety of input scenarios, such as HMD, HMD+2IMUs, HMD+3IMUs, etc. The scalability of inputs may accommodate users' choices for both high tracking accuracy and easy-to-wear. A lightweight temporal-spatial feature learning network is proposed in HMD-Poser to guarantee that the model runs in real-time on HMDs. Furthermore, HMD-Poser presents online body shape estimation to improve the position accuracy of body joints. Extensive experimental results on the challenging AMASS dataset show that HMD-Poser achieves new state-of-the-art results in both accuracy and real-time performance. We also build a new free-dancing motion dataset to evaluate HMD-Poser's on-device performance and investigate the performance gap between synthetic data and real-captured sensor data. Finally, we demonstrate our HMD-Poser with a real-time Avatar-driving application on a commercial HMD. Our code and free-dancing motion dataset are available https://pico-ai-team.github.io/hmd-poser
△ Less
Submitted 6 March, 2024;
originally announced March 2024.
-
LDSF: Lightweight Dual-Stream Framework for SAR Target Recognition by Coupling Local Electromagnetic Scattering Features and Global Visual Features
Authors:
Xuying Xiong,
Xinyu Zhang,
Weidong Jiang,
Tianpeng Liu
Abstract:
Mainstream DNN-based SAR-ATR methods still face issues such as easy overfitting of a few training data, high computational overhead, and poor interpretability of the black-box model. Integrating physical knowledge into DNNs to improve performance and achieve a higher level of physical interpretability becomes the key to solving the above problems. This paper begins by focusing on the electromagnet…
▽ More
Mainstream DNN-based SAR-ATR methods still face issues such as easy overfitting of a few training data, high computational overhead, and poor interpretability of the black-box model. Integrating physical knowledge into DNNs to improve performance and achieve a higher level of physical interpretability becomes the key to solving the above problems. This paper begins by focusing on the electromagnetic (EM) backscattering mechanism. We extract the EM scattering (EMS) information from the complex SAR data and integrate the physical properties of the target into the network through a dual-stream framework to guide the network to learn physically meaningful and discriminative features. Specifically, one stream is the local EMS feature (LEMSF) extraction net. It is a heterogeneous graph neural network (GNN) guided by a multi-level multi-head attention mechanism. LEMSF uses the EMS information to obtain topological structure features and high-level physical semantic features. The other stream is a CNN-based global visual features (GVF) extraction net that captures the visual features of SAR pictures from the image domain. After obtaining the two-stream features, a feature fusion subnetwork is proposed to adaptively learn the fusion strategy. Thus, the two-stream features can maximize the performance. Furthermore, the loss function is designed based on the graph distance measure to promote intra-class aggregation. We discard overly complex design ideas and effectively control the model size while maintaining algorithm performance. Finally, to better validate the performance and generalizability of the algorithms, two more rigorous evaluation protocols, namely once-for-all (OFA) and less-for-more (LFM), are used to verify the superiority of the proposed algorithm on the MSTAR.
△ Less
Submitted 6 March, 2024;
originally announced March 2024.
-
Observation of the decay $h_{c}\to3(π^{+}π^{-})π^{0}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (639 additional authors not shown)
Abstract:
Based on $(2712.4\pm14.1)\times10^{6}$ $ψ(3686)$ events collected with the BESIII detector, we study the decays $h_{c}\to3(π^{+}π^{-})π^{0}$, $h_{c}\to2(π^{+}π^{-})ω$, $h_{c}\to2(π^{+}π^{-})π^{0}η$, $h_{c}\to2(π^{+}π^{-})η$, and $h_{c}\to p\bar{p}$ via $ψ(3686)\toπ^{0}h_{c}$. The decay channel $h_{c}\to3(π^{+}π^{-})π^{0}$ is observed for the first time, and its branching fraction is determined to…
▽ More
Based on $(2712.4\pm14.1)\times10^{6}$ $ψ(3686)$ events collected with the BESIII detector, we study the decays $h_{c}\to3(π^{+}π^{-})π^{0}$, $h_{c}\to2(π^{+}π^{-})ω$, $h_{c}\to2(π^{+}π^{-})π^{0}η$, $h_{c}\to2(π^{+}π^{-})η$, and $h_{c}\to p\bar{p}$ via $ψ(3686)\toπ^{0}h_{c}$. The decay channel $h_{c}\to3(π^{+}π^{-})π^{0}$ is observed for the first time, and its branching fraction is determined to be $\left( {9.28\pm 1.14 \pm 0.77} \right) \times {10^{ - 3}}$, where the first uncertainty is statistical and the second is systematic. In addition, first evidence is found for the modes $h_{c} \to 2(π^{+}π^{-})π^{0}η$ and $h_{c}\to2(π^{+}π^{-})ω$ with significances of 4.8$σ$ and 4.7$σ$, and their branching fractions are determined to be $(7.55\pm1.51\pm0.77)\times10^{-3}$ and $\left( {4.00 \pm 0.86 \pm 0.35}\right) \times {10^{ - 3}}$, respectively. No significant signals of $h_c\to 2(π^+π^-)η$ and $h_{c}\to p\bar{p}$ are observed, and the upper limits of the branching fractions of these decays are determined to be $<6.19\times10^{-4}$ and $<4.40\times10^{-5}$ at the 90% confidence level, respectively.
△ Less
Submitted 6 March, 2024;
originally announced March 2024.
-
Foot Shape-Dependent Resistive Force Model for Bipedal Walkers on Granular Terrains
Authors:
Xunjie Chen,
Aditya Anikode,
**gang Yi,
Tao Liu
Abstract:
Legged robots have demonstrated high efficiency and effectiveness in unstructured and dynamic environments. However, it is still challenging for legged robots to achieve rapid and efficient locomotion on deformable, yielding substrates, such as granular terrains. We present an enhanced resistive force model for bipedal walkers on soft granular terrains by introducing effective intrusion depth corr…
▽ More
Legged robots have demonstrated high efficiency and effectiveness in unstructured and dynamic environments. However, it is still challenging for legged robots to achieve rapid and efficient locomotion on deformable, yielding substrates, such as granular terrains. We present an enhanced resistive force model for bipedal walkers on soft granular terrains by introducing effective intrusion depth correction. The enhanced force model captures fundamental kinetic results considering the robot foot shape, walking gait speed variation, and energy expense. The model is validated by extensive foot intrusion experiments with a bipedal robot. The results confirm the model accuracy on the given type of granular terrains. The model can be further integrated with the motion control of bipedal robotic walkers.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Prediction of turbulent channel flow using Fourier neural operator-based machine-learning strategy
Authors:
Yunpeng Wang,
Zhijie Li,
Zelong Yuan,
Wenhui Peng,
Tianyuan Liu,
Jianchun Wang
Abstract:
Fast and accurate predictions of turbulent flows are of great importance in the science and engineering field. In this paper, we investigate the implicit U-Net enhanced Fourier neural operator (IUFNO) in the stable prediction of long-time dynamics of three-dimensional (3D) turbulent channel flows. The trained IUFNO models are tested in the large-eddy simulations (LES) at coarse grids for three fri…
▽ More
Fast and accurate predictions of turbulent flows are of great importance in the science and engineering field. In this paper, we investigate the implicit U-Net enhanced Fourier neural operator (IUFNO) in the stable prediction of long-time dynamics of three-dimensional (3D) turbulent channel flows. The trained IUFNO models are tested in the large-eddy simulations (LES) at coarse grids for three friction Reynolds numbers: $Re_τ\approx180$, $395$ and $590$. The adopted near-wall mesh grids are tangibly coarser than the general requirements for wall-resolved LES. The numerical experiments show that the IUFNO framework outperforms the traditional dynamic Smagorinsky model (DSM) and the wall-adapted local eddy-viscosity (WALE) model in the predictions of a variety of flow statistics and structures, including the mean and fluctuating velocities, the probability density functions (PDFs) and joint PDF of velocity fluctuations, the Reynolds stress profile, the kinetic energy spectrum, and the Q-criterion (vortex structures). Meanwhile, the trained IUFNO models are computationally much faster than the traditional LES models. Thus, the IUFNO is a promising approach for the fast prediction of wall-bounded turbulent flow.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Canonical Hamiltonian Guiding Center Dynamics and Its Intrinsic Magnetic Moment
Authors:
Ruili Zhang,
Jian Liu,
Tong Liu,
Wenxiang Li,
Xiaogang Wang,
Yifa Tang
Abstract:
The concept of guiding center is potent in astrophysics, space plasmas, fusion researches, and arc plasmas to solve the multi-scale dynamics of magnetized plasmas. In this letter, we rigorously prove that the guiding center dynamics can generally be described as a constrained canonical Hamiltonian system with two constraints in six dimensional phase space, and that the solution flow of the guiding…
▽ More
The concept of guiding center is potent in astrophysics, space plasmas, fusion researches, and arc plasmas to solve the multi-scale dynamics of magnetized plasmas. In this letter, we rigorously prove that the guiding center dynamics can generally be described as a constrained canonical Hamiltonian system with two constraints in six dimensional phase space, and that the solution flow of the guiding center lies on a canonical symplectic sub-manifold. The guiding center can thus be modeled as a pseudo-particle with an intrinsic magnetic moment, which properly replaces the charged particle dynamics on time scales larger than the gyro-period. The complete dynamical behaviors, such as the velocity and force, of the guiding center pseudo-particle can be clearly deduced from the model. Furthermore, a series of related theories, such as symplectic numerical methods, the canonical gyro-kinetic theory, and canonical particle-in-cell algorithms can be systematically developed based on the canonical guiding center system. The canonical guiding center theory also provides an enlightenment for the origin of the intrinsic magnetic moment.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
AS-ES Learning: Towards Efficient CoT Learning in Small Models
Authors:
Nuwa Xi,
Yuhan Chen,
Sendong Zhao,
Haochun Wang,
Bing Qin,
Ting Liu
Abstract:
Chain-of-Thought (CoT) serves as a critical emerging ability in LLMs, especially when it comes to logical reasoning. Attempts have been made to induce such ability in small models as well by distilling from the data with CoT generated by Large Language Models (LLMs). However, existing methods often simply generate and incorporate more data from LLMs and fail to note the importance of efficiently u…
▽ More
Chain-of-Thought (CoT) serves as a critical emerging ability in LLMs, especially when it comes to logical reasoning. Attempts have been made to induce such ability in small models as well by distilling from the data with CoT generated by Large Language Models (LLMs). However, existing methods often simply generate and incorporate more data from LLMs and fail to note the importance of efficiently utilizing existing CoT data. We here propose a new training paradigm AS-ES (Abstractive Segments - Extractive Segments) learning, which exploits the inherent information in CoT for iterative generation. Experiments show that our methods surpass the direct seq2seq training on CoT-extensive tasks like MWP and PET summarization, without data augmentation or altering the model itself. Furthermore, we explore the reason behind the inefficiency of small models in learning CoT and provide an explanation of why AS-ES learning works, giving insights into the underlying mechanism of CoT.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Mitigating Label Noise on Graph via Topological Sample Selection
Authors:
Yuhao Wu,
Jiangchao Yao,
Xiaobo Xia,
Jun Yu,
Ruxin Wang,
Bo Han,
Tongliang Liu
Abstract:
Despite the success of the carefully-annotated benchmarks, the effectiveness of existing graph neural networks (GNNs) can be considerably impaired in practice when the real-world graph data is noisily labeled. Previous explorations in sample selection have been demonstrated as an effective way for robust learning with noisy labels, however, the conventional studies focus on i.i.d data, and when mo…
▽ More
Despite the success of the carefully-annotated benchmarks, the effectiveness of existing graph neural networks (GNNs) can be considerably impaired in practice when the real-world graph data is noisily labeled. Previous explorations in sample selection have been demonstrated as an effective way for robust learning with noisy labels, however, the conventional studies focus on i.i.d data, and when moving to non-iid graph data and GNNs, two notable challenges remain: (1) nodes located near topological class boundaries are very informative for classification but cannot be successfully distinguished by the heuristic sample selection. (2) there is no available measure that considers the graph topological information to promote sample selection in a graph. To address this dilemma, we propose a $\textit{Topological Sample Selection}$ (TSS) method that boosts the informative sample selection process in a graph by utilising topological information. We theoretically prove that our procedure minimizes an upper bound of the expected risk under target clean distribution, and experimentally show the superiority of our method compared with state-of-the-art baselines.
△ Less
Submitted 4 June, 2024; v1 submitted 4 March, 2024;
originally announced March 2024.
-
Observation of $ψ(3686)\to 3φ$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (645 additional authors not shown)
Abstract:
Using $(2.712\pm0.014)\times 10^9$ $ψ(3686)$ events collected by the BESIII detector operating at the BEPCII collider, we report the first observation of $ψ(3686)\to 3φ$ decay with a significance larger than 10$σ$. The branching fraction of this decay is determined to be $(1.46\pm0.05\pm0.17)\times10^{-5}$, where the first uncertainty is statistical and the second is systematic. No significant str…
▽ More
Using $(2.712\pm0.014)\times 10^9$ $ψ(3686)$ events collected by the BESIII detector operating at the BEPCII collider, we report the first observation of $ψ(3686)\to 3φ$ decay with a significance larger than 10$σ$. The branching fraction of this decay is determined to be $(1.46\pm0.05\pm0.17)\times10^{-5}$, where the first uncertainty is statistical and the second is systematic. No significant structure is observed in the $φφ$ invariant mass spectra.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Hypertext Entity Extraction in Webpage
Authors:
Yifei Yang,
Tianqiao Liu,
Bo Shao,
Hai Zhao,
Linjun Shou,
Ming Gong,
Daxin Jiang
Abstract:
Webpage entity extraction is a fundamental natural language processing task in both research and applications. Nowadays, the majority of webpage entity extraction models are trained on structured datasets which strive to retain textual content and its structure information. However, existing datasets all overlook the rich hypertext features (e.g., font color, font size) which show their effectiven…
▽ More
Webpage entity extraction is a fundamental natural language processing task in both research and applications. Nowadays, the majority of webpage entity extraction models are trained on structured datasets which strive to retain textual content and its structure information. However, existing datasets all overlook the rich hypertext features (e.g., font color, font size) which show their effectiveness in previous works. To this end, we first collect a \textbf{H}ypertext \textbf{E}ntity \textbf{E}xtraction \textbf{D}ataset (\textit{HEED}) from the e-commerce domains, scra** both the text and the corresponding explicit hypertext features with high-quality manual entity annotations. Furthermore, we present the \textbf{Mo}E-based \textbf{E}ntity \textbf{E}xtraction \textbf{F}ramework (\textit{MoEEF}), which efficiently integrates multiple features to enhance model performance by Mixture of Experts and outperforms strong baselines, including the state-of-the-art small-scale models and GPT-3.5-turbo. Moreover, the effectiveness of hypertext features in \textit{HEED} and several model components in \textit{MoEEF} are analyzed.
△ Less
Submitted 3 March, 2024;
originally announced March 2024.
-
RKHS-BA: A Semantic Correspondence-Free Multi-View Registration Framework with Global Tracking
Authors:
Ray Zhang,
**gwei Song,
Xiang Gao,
Junzhe Wu,
Tianyi Liu,
**yuan Zhang,
Ryan Eustice,
Maani Ghaffari
Abstract:
This work reports a novel Bundle Adjustment (BA) formulation using a Reproducing Kernel Hilbert Space (RKHS) representation called RKHS-BA. The proposed formulation is correspondence-free, enables the BA to use RGB-D/LiDAR and semantic labels in the optimization directly, and provides a generalization for the photometric loss function commonly used in direct methods. RKHS-BA can incorporate appear…
▽ More
This work reports a novel Bundle Adjustment (BA) formulation using a Reproducing Kernel Hilbert Space (RKHS) representation called RKHS-BA. The proposed formulation is correspondence-free, enables the BA to use RGB-D/LiDAR and semantic labels in the optimization directly, and provides a generalization for the photometric loss function commonly used in direct methods. RKHS-BA can incorporate appearance and semantic labels within a continuous spatial-semantic functional representation that does not require optimization via image pyramids. We demonstrate its applications in sliding-window odometry and global LiDAR map**, which show highly robust performance in extremely challenging scenes and the best trade-off of generalization and accuracy.
△ Less
Submitted 2 March, 2024;
originally announced March 2024.
-
Longitudinal beam dynamics design fpr Super Tau-Charm Facility
Authors:
Linhao Zhang,
Tao Liu,
Sangya Li,
**gyu Tang,
Qing Luo
Abstract:
The project of Super Tau-Charm Facility (STCF) proposed in China, as a new-generation high-luminosity $e^+e^-$ collider in the low-energy region with the center-of-mass energy of 2-7 GeV, is well underway. The luminosity is targeted at $1.0\times10^{35} cm^{-2}s^{-1}$ at the optimized beam energy of 2 GeV. Longitudinal beam dynamics becomes of great importance for the STCF due to the constraints f…
▽ More
The project of Super Tau-Charm Facility (STCF) proposed in China, as a new-generation high-luminosity $e^+e^-$ collider in the low-energy region with the center-of-mass energy of 2-7 GeV, is well underway. The luminosity is targeted at $1.0\times10^{35} cm^{-2}s^{-1}$ at the optimized beam energy of 2 GeV. Longitudinal beam dynamics becomes of great importance for the STCF due to the constraints from the novel beam-beam effect called coherent X-Z instability and severe beam collective effects. In this paper, we will develop an iterative optimization model for the STCF longitudinal beam dynamics design, which takes into account the influence of transverse dynamics, coherent X-Z instability, and collective effects.
△ Less
Submitted 1 March, 2024;
originally announced March 2024.
-
Assessing Bilateral Neurovascular Bundles Function with Pulsed Wave Doppler Ultrasound: Implications for Reducing Erectile Dysfunction Following Prostate Radiotherapy
Authors:
**g Wang,
Xiaofeng Yang,
Boran Zhou,
James Sohn,
Richard Qiu,
Pretesh Patel,
Ashesh B. Jani,
Tian Liu
Abstract:
This study aims to evaluate the functional status of bilateral neurovascular bundles (NVBs) using pulsed wave Doppler ultrasound in patients undergoing prostate radiotherapy (RT). Sixty-two patients (mean age: 66.1 +/- 7.2 years) underwent transrectal ultrasound scan using a conventional ultrasound scanner, a 7.5 MHz bi-plane probe and a mechanical stepper. The ultrasound protocol comprised 3 step…
▽ More
This study aims to evaluate the functional status of bilateral neurovascular bundles (NVBs) using pulsed wave Doppler ultrasound in patients undergoing prostate radiotherapy (RT). Sixty-two patients (mean age: 66.1 +/- 7.2 years) underwent transrectal ultrasound scan using a conventional ultrasound scanner, a 7.5 MHz bi-plane probe and a mechanical stepper. The ultrasound protocol comprised 3 steps: 1) 3D B-mode scans of the entire prostate, 2) localization of NVBs using color flow Doppler imaging, and 3) measurement of NVB function using pulsed wave Doppler. Five pulsed Doppler waveform features were extracted: peak systolic velocity (PSV), end-diastolic velocity (EDV), mean velocity (Vm), resistive index (RI), and pulsatile index (PI). In summary, this study presents a Doppler evaluation of NVBs in patients undergoing prostate RT. It highlights substantial differences in Doppler ultrasound waveform features between bilateral NVBs. The proposed ultrasound method may prove valuable as clinicians strive to deliver NVB-sparing RT to preserve sexual function effectively and enhance patients' overall well-being.
△ Less
Submitted 29 February, 2024;
originally announced March 2024.
-
The Giant Graviton Expansion from Bubbling Geometry
Authors:
Evan Deddo,
James T. Liu,
Leopoldo A. Pando Zayas,
Robert J. Saskowski
Abstract:
The superconformal index of half-BPS states in ${\cal N}=4$ supersymmetric Yang-Mills with gauge group $U(N)$ admits an expansion in terms of giant gravitons, ${\cal I}_N(q)={\cal I}_\infty(q) \sum\limits_{m=0}^\infty q^{mN}\hat{\mathcal I}_m(q)$, where $m$ is the number of giant gravitons. We derive this expansion directly in supergravity from the class of half-BPS solutions due to Lin, Lunin, an…
▽ More
The superconformal index of half-BPS states in ${\cal N}=4$ supersymmetric Yang-Mills with gauge group $U(N)$ admits an expansion in terms of giant gravitons, ${\cal I}_N(q)={\cal I}_\infty(q) \sum\limits_{m=0}^\infty q^{mN}\hat{\mathcal I}_m(q)$, where $m$ is the number of giant gravitons. We derive this expansion directly in supergravity from the class of half-BPS solutions due to Lin, Lunin, and Maldacena in type IIB supergravity. The moduli space of these configurations can be quantized using covariant quantization methods. We review how this quantization leads to the graviton index, ${\cal I}_\infty(q)$, and present a modification that leads to the precise expression for the expansion in terms of giant gravitons. Our proposal provides a derivation of the giant graviton expansion directly in terms of supergravity degrees of freedom. We also comment on how to derive the expansion in terms of the effective Fermi droplet picture.
△ Less
Submitted 29 February, 2024;
originally announced February 2024.
-
StarCoder 2 and The Stack v2: The Next Generation
Authors:
Anton Lozhkov,
Raymond Li,
Loubna Ben Allal,
Federico Cassano,
Joel Lamy-Poirier,
Nouamane Tazi,
Ao Tang,
Dmytro Pykhtar,
Jiawei Liu,
Yuxiang Wei,
Tianyang Liu,
Max Tian,
Denis Kocetkov,
Arthur Zucker,
Younes Belkada,
Zijian Wang,
Qian Liu,
Dmitry Abulkhanov,
Indraneil Paul,
Zhuang Li,
Wen-Ding Li,
Megan Risdal,
Jia Li,
Jian Zhu,
Terry Yue Zhuo
, et al. (41 additional authors not shown)
Abstract:
The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder2. In partnership with Software Heritage (SWH), we build The Stack v2 on top of the digital commons of their source code archive. Alongside the SWH repositories spanning 619 programming languages, we carefully select other high-quality data…
▽ More
The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder2. In partnership with Software Heritage (SWH), we build The Stack v2 on top of the digital commons of their source code archive. Alongside the SWH repositories spanning 619 programming languages, we carefully select other high-quality data sources, such as GitHub pull requests, Kaggle notebooks, and code documentation. This results in a training set that is 4x larger than the first StarCoder dataset. We train StarCoder2 models with 3B, 7B, and 15B parameters on 3.3 to 4.3 trillion tokens and thoroughly evaluate them on a comprehensive set of Code LLM benchmarks. We find that our small model, StarCoder2-3B, outperforms other Code LLMs of similar size on most benchmarks, and also outperforms StarCoderBase-15B. Our large model, StarCoder2- 15B, significantly outperforms other models of comparable size. In addition, it matches or outperforms CodeLlama-34B, a model more than twice its size. Although DeepSeekCoder- 33B is the best-performing model at code completion for high-resource languages, we find that StarCoder2-15B outperforms it on math and code reasoning benchmarks, as well as several low-resource languages. We make the model weights available under an OpenRAIL license and ensure full transparency regarding the training data by releasing the SoftWare Heritage persistent IDentifiers (SWHIDs) of the source code data.
△ Less
Submitted 29 February, 2024;
originally announced February 2024.
-
Light quark mediated Higgs boson production in association with a jet at the next-to-next-leading order and beyond
Authors:
Tao Liu,
Alexander A. Penin,
Abdur Rehman
Abstract:
We study the light quark effect on the Higgs boson production in association with a jet at the LHC in the intermediate transverse momentum region between the quark and the Higgs boson mass scales. Though the effect is suppressed by the small Yukawa coupling, it is enhanced by large logarithms of the quark mass ratio to the Higgs boson mass or transverse momentum. Following a remarkable success of…
▽ More
We study the light quark effect on the Higgs boson production in association with a jet at the LHC in the intermediate transverse momentum region between the quark and the Higgs boson mass scales. Though the effect is suppressed by the small Yukawa coupling, it is enhanced by large logarithms of the quark mass ratio to the Higgs boson mass or transverse momentum. Following a remarkable success of the logarithmic expansion [39] for the prediction of the next-to-next-to-leading bottom quark contribution to the total cross section of the Higgs boson production we extend the analysis to its kinematical distributions. A new factorization formula is derived for the light quark mediated $gg\to Hg$ amplitudes and the differential cross section of the process is computed in the logarithmic approximation, which is used for an estimate of the bottom quark effect at the next-to-next-to-leading order.
△ Less
Submitted 28 February, 2024;
originally announced February 2024.
-
Evaluating Quantized Large Language Models
Authors:
Shiyao Li,
Xuefei Ning,
Luning Wang,
Tengxuan Liu,
Xiangsheng Shi,
Shengen Yan,
Guohao Dai,
Huazhong Yang,
Yu Wang
Abstract:
Post-training quantization (PTQ) has emerged as a promising technique to reduce the cost of large language models (LLMs). Specifically, PTQ can effectively mitigate memory consumption and reduce computational overhead in LLMs. To meet the requirements of both high efficiency and performance across diverse scenarios, a comprehensive evaluation of quantized LLMs is essential to guide the selection o…
▽ More
Post-training quantization (PTQ) has emerged as a promising technique to reduce the cost of large language models (LLMs). Specifically, PTQ can effectively mitigate memory consumption and reduce computational overhead in LLMs. To meet the requirements of both high efficiency and performance across diverse scenarios, a comprehensive evaluation of quantized LLMs is essential to guide the selection of quantization methods. This paper presents a thorough evaluation of these factors by evaluating the effect of PTQ on Weight, Activation, and KV Cache on 11 model families, including OPT, LLaMA2, Falcon, Bloomz, Mistral, ChatGLM, Vicuna, LongChat, StableLM, Gemma, and Mamba, with parameters ranging from 125M to 180B. The evaluation encompasses five types of tasks: basic NLP, emergent ability, trustworthiness, dialogue, and long-context tasks. Moreover, we also evaluate the state-of-the-art (SOTA) quantization methods to demonstrate their applicability. Based on the extensive experiments, we systematically summarize the effect of quantization, provide recommendations to apply quantization techniques, and point out future directions. The code can be found in https://github.com/thu-nics/qllm-eval.
△ Less
Submitted 6 June, 2024; v1 submitted 28 February, 2024;
originally announced February 2024.
-
Making Them Ask and Answer: Jailbreaking Large Language Models in Few Queries via Disguise and Reconstruction
Authors:
Tong Liu,
Yingjie Zhang,
Zhe Zhao,
Yinpeng Dong,
Guozhu Meng,
Kai Chen
Abstract:
In recent years, large language models (LLMs) have demonstrated notable success across various tasks, but the trustworthiness of LLMs is still an open problem. One specific threat is the potential to generate toxic or harmful responses. Attackers can craft adversarial prompts that induce harmful responses from LLMs. In this work, we pioneer a theoretical foundation in LLMs security by identifying…
▽ More
In recent years, large language models (LLMs) have demonstrated notable success across various tasks, but the trustworthiness of LLMs is still an open problem. One specific threat is the potential to generate toxic or harmful responses. Attackers can craft adversarial prompts that induce harmful responses from LLMs. In this work, we pioneer a theoretical foundation in LLMs security by identifying bias vulnerabilities within the safety fine-tuning and design a black-box jailbreak method named DRA (Disguise and Reconstruction Attack), which conceals harmful instructions through disguise and prompts the model to reconstruct the original harmful instruction within its completion. We evaluate DRA across various open-source and closed-source models, showcasing state-of-the-art jailbreak success rates and attack efficiency. Notably, DRA boasts a 91.1% attack success rate on OpenAI GPT-4 chatbot.
△ Less
Submitted 10 June, 2024; v1 submitted 28 February, 2024;
originally announced February 2024.
-
HardTaint: Production-Run Dynamic Taint Analysis via Selective Hardware Tracing
Authors:
Yiyu Zhang,
Tianyi Liu,
Yueyang Wang,
Yun Qi,
Kai Ji,
Jian Tang,
Xiaoliang Wang,
Xuandong Li,
Zhiqiang Zuo
Abstract:
Dynamic taint analysis (DTA), as a fundamental analysis technique, is widely used in security, privacy, and diagnosis, etc. As DTA demands to collect and analyze massive taint data online, it suffers extremely high runtime overhead. Over the past decades, numerous attempts have been made to lower the overhead of DTA. Unfortunately, the reductions they achieved are marginal, causing DTA only applic…
▽ More
Dynamic taint analysis (DTA), as a fundamental analysis technique, is widely used in security, privacy, and diagnosis, etc. As DTA demands to collect and analyze massive taint data online, it suffers extremely high runtime overhead. Over the past decades, numerous attempts have been made to lower the overhead of DTA. Unfortunately, the reductions they achieved are marginal, causing DTA only applicable to the debugging/testing scenarios. In this paper, we propose and implement HardTaint, a system that can realize production-run dynamic taint tracking. HardTaint adopts a hybrid and systematic design which combines static analysis, selective hardware tracing and parallel graph processing techniques. The comprehensive evaluations demonstrate that HardTaint introduces only around 9% runtime overhead which is an order of magnitude lower than the state-of-the-arts, while without sacrificing any taint detection capability.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
The LOFAR-eFEDS survey: The incidence of radio and X-ray AGN and the disk-jet connection
Authors:
Z. Igo,
A. Merloni,
D. Hoang,
J. Buchner,
T. Liu,
M. Salvato,
R. Arcodia,
S. Bellstedt,
M. Brüggen,
J. H. Croston,
F. de Gasperin,
A. Georgakakis,
M. J. Hardcastle,
K. Nandra,
Q. Ni,
T. Pasini,
T. Shimwell,
J. Wolf
Abstract:
Radio jets are present in a diverse sample of AGN. However, the mechanisms of jet powering are not fully understood, and it is yet unclear to what extent they obey mass-invariant scaling relations, similar to those found for the triggering and fuelling of X-ray selected AGN. We study the incidence of eROSITA/eFEDS X-ray and LOFAR radio AGN as a function of several stellar mass normalised AGN power…
▽ More
Radio jets are present in a diverse sample of AGN. However, the mechanisms of jet powering are not fully understood, and it is yet unclear to what extent they obey mass-invariant scaling relations, similar to those found for the triggering and fuelling of X-ray selected AGN. We study the incidence of eROSITA/eFEDS X-ray and LOFAR radio AGN as a function of several stellar mass normalised AGN power indicators. A new sample of radio AGN from the LOFAR-eFEDS survey is defined and we publicly release this catalogue, including host galaxy counterparts from the Legacy Survey DR9, LOFAR radio morphologies and host galaxy properties from the complete, spectroscopic (z<0.4) GAMA09 survey. The fraction of GAMA09 galaxies hosting radio, X-ray and both radio and X-ray AGN are calculated as a function of the specific black hole kinetic ($λ_{\rm Jet}$) and radiative ($λ_{\rm Edd}$) power. The incidence of eFEDS X-ray AGN as a function of $λ_{\rm Edd}$ shows the same mass-invariance as found in past studies. Meanwhile, radio AGN, regardless of their morphology, are more likely to be hosted in more massive galaxies, at all $λ_{\rm Jet}$. Across the stellar mass range, the compact radio AGN incidence follows the same power-law distribution, showing that it is not only high mass galaxies that host high power radio AGN and vice versa. On the other hand, the incidence of compact and complex radio AGN is boosted at the highest jet powers, diverging from a simple power-law. Interestingly, this increased incidence cannot be explained by more powerful radio AGN lying in more dense environments which could naturally boost their radio luminosity. Overall, we show that statistical incidence studies are a powerful method to probe disk-jet coupling for different AGN accretion modes, although future work on a more reliable determination of jet power for diverse samples of radio AGN is needed.
△ Less
Submitted 26 February, 2024;
originally announced February 2024.
-
Efficient Continuous-Time Ego-Motion Estimation for Asynchronous Event-based Data Associations
Authors:
Zhixiang Wang,
Xudong Li,
Tianle Liu,
Yizhai Zhang,
Panfeng Huang
Abstract:
Event cameras are bio-inspired vision sensors that asynchronously measure per-pixel brightness changes. The high temporal resolution and asynchronicity of event cameras offer great potential for estimating the robot motion state. Recent works have adopted the continuous-time ego-motion estimation methods to exploit the inherent nature of event cameras. However, most of the adopted methods have poo…
▽ More
Event cameras are bio-inspired vision sensors that asynchronously measure per-pixel brightness changes. The high temporal resolution and asynchronicity of event cameras offer great potential for estimating the robot motion state. Recent works have adopted the continuous-time ego-motion estimation methods to exploit the inherent nature of event cameras. However, most of the adopted methods have poor real-time performance. To alleviate it, a lightweight Gaussian Process (GP)-based estimation framework is proposed to efficiently estimate motion trajectory from asynchronous event-driven data associations. Concretely, an asynchronous front-end pipeline is designed to adapt event-driven feature trackers and generate feature trajectories from event streams; a parallel dynamic sliding-window back-end is presented within the framework of sparse GP regression on SE(3). Notably, a specially designed state marginalization strategy is employed to ensure the consistency and sparsity of this GP regression. Experiments conducted on synthetic and real-world datasets demonstrate that the proposed method achieves competitive precision and superior robustness compared to the state-of-the-art. Furthermore, the evaluations on three 60 s trajectories show that the proposal outperforms the ISAM2-based method in terms of computational efficiency by 2.64, 4.22, and 11.70 times, respectively.
△ Less
Submitted 26 February, 2024;
originally announced February 2024.
-
High-order topological pum** on a superconducting quantum processor
Authors:
Cheng-Lin Deng,
Yu Liu,
Yu-Ran Zhang,
Xue-Gang Li,
Tao Liu,
Chi-Tong Chen,
Tong Liu,
Cong-Wei Lu,
Yong-Yi Wang,
Tian-Ming Li,
Cai-** Fang,
Si-Yun Zhou,
Jia-Cheng Song,
Yue-Shan Xu,
Yang He,
Zheng-He Liu,
Kai-Xuan Huang,
Zhong-Cheng Xiang,
Jie-Ci Wang,
Dong-Ning Zheng,
Guang-Ming Xue,
Kai Xu,
H. F. Yu,
Heng Fan
Abstract:
High-order topological phases of matter refer to the systems of $n$-dimensional bulk with the topology of $m$-th order, exhibiting $(n-m)$-dimensional boundary modes and can be characterized by topological pum**. Here, we experimentally demonstrate two types of second-order topological pumps, forming four 0-dimensional corner localized states on a 4$\times$4 square lattice array of 16 supercondu…
▽ More
High-order topological phases of matter refer to the systems of $n$-dimensional bulk with the topology of $m$-th order, exhibiting $(n-m)$-dimensional boundary modes and can be characterized by topological pum**. Here, we experimentally demonstrate two types of second-order topological pumps, forming four 0-dimensional corner localized states on a 4$\times$4 square lattice array of 16 superconducting qubits. The initial ground state of the system for half-filling, as a product of four identical entangled 4-qubit states, is prepared using an adiabatic scheme. During the pum** procedure, we adiabatically modulate the superlattice Bose-Hubbard Hamiltonian by precisely controlling both the hop** strengths and on-site potentials. At the half pum** period, the system evolves to a corner-localized state in a quadrupole configuration. The robustness of the second-order topological pump is also investigated by introducing different on-site disorder. Our work studies the topological properties of high-order topological phases from the dynamical transport picture using superconducting qubits, which would inspire further research on high-order topological phases.
△ Less
Submitted 25 February, 2024;
originally announced February 2024.
-
From COBIT to ISO 42001: Evaluating Cybersecurity Frameworks for Opportunities, Risks, and Regulatory Compliance in Commercializing Large Language Models
Authors:
Timothy R. McIntosh,
Teo Susnjak,
Tong Liu,
Paul Watters,
Raza Nowrozy,
Malka N. Halgamuge
Abstract:
This study investigated the integration readiness of four predominant cybersecurity Governance, Risk and Compliance (GRC) frameworks - NIST CSF 2.0, COBIT 2019, ISO 27001:2022, and the latest ISO 42001:2023 - for the opportunities, risks, and regulatory compliance when adopting Large Language Models (LLMs), using qualitative content analysis and expert validation. Our analysis, with both LLMs and…
▽ More
This study investigated the integration readiness of four predominant cybersecurity Governance, Risk and Compliance (GRC) frameworks - NIST CSF 2.0, COBIT 2019, ISO 27001:2022, and the latest ISO 42001:2023 - for the opportunities, risks, and regulatory compliance when adopting Large Language Models (LLMs), using qualitative content analysis and expert validation. Our analysis, with both LLMs and human experts in the loop, uncovered potential for LLM integration together with inadequacies in LLM risk oversight of those frameworks. Comparative gap analysis has highlighted that the new ISO 42001:2023, specifically designed for Artificial Intelligence (AI) management systems, provided most comprehensive facilitation for LLM opportunities, whereas COBIT 2019 aligned most closely with the impending European Union AI Act. Nonetheless, our findings suggested that all evaluated frameworks would benefit from enhancements to more effectively and more comprehensively address the multifaceted risks associated with LLMs, indicating a critical and time-sensitive need for their continuous evolution. We propose integrating human-expert-in-the-loop validation processes as crucial for enhancing cybersecurity frameworks to support secure and compliant LLM integration, and discuss implications for the continuous evolution of cybersecurity GRC frameworks to support the secure integration of LLMs.
△ Less
Submitted 24 February, 2024;
originally announced February 2024.
-
Minions: Accelerating Large Language Model Inference with Adaptive and Collective Speculative Decoding
Authors:
Siqi Wang,
Hailong Yang,
Xuezhu Wang,
Tongxuan Liu,
Pengbo Wang,
Xuning Liang,
Kejie Ma,
Tianyu Feng,
Xin You,
Yongjun Bao,
Yi Liu,
Zhongzhi Luan,
Depei Qian
Abstract:
Large language models (LLM) have recently attracted surging interest due to their outstanding capabilities across various domains. However, enabling efficient LLM inference is challenging due to its autoregressive decoding that generates tokens only one at a time. Although research works apply pruning or quantization to speed up LLM inference, they typically require fine-tuning the LLM, incurring…
▽ More
Large language models (LLM) have recently attracted surging interest due to their outstanding capabilities across various domains. However, enabling efficient LLM inference is challenging due to its autoregressive decoding that generates tokens only one at a time. Although research works apply pruning or quantization to speed up LLM inference, they typically require fine-tuning the LLM, incurring significant time and economic costs. Meanwhile, speculative decoding has been proposed to use small speculative models (SSMs) to accelerate the inference of LLM. However, the low acceptance rate of SSM and the high verification cost of LLM prohibit further performance improvement of inference. In this paper, we propose Minions, an LLM inference system that accelerates LLM inference with a collective and adaptive speculative generation. Specifically, Minions proposes a majority-voted mechanism to leverage multiple SSMs to jointly speculate the outputs of LLM, which improves the inference performance without introducing prohibitive computation costs for LLM. To better trade off the number of tokens speculated from SSM and the verification cost of LLM, Minions proposes an adaptive mechanism to dynamically determine the optimal speculation length of SSM, which can achieve better inference performance across different models, datasets, and hyper-parameters. In addition, Minions decouples the SSM decoding and LLM verification efficiently and adopts a pipelined execution mechanism to further improve the inference performance of LLM. By comparing with the state-of-the-art LLM inference systems, we demonstrate that Minions can achieve higher inference throughput and lower inference time.
△ Less
Submitted 23 February, 2024;
originally announced February 2024.
-
PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain
Authors:
Liang Chen,
Yichi Zhang,
Shuhuai Ren,
Haozhe Zhao,
Zefan Cai,
Yuchi Wang,
Peiyi Wang,
Xiangdi Meng,
Tianyu Liu,
Baobao Chang
Abstract:
We present PCA-Bench, a multimodal decision-making benchmark for evaluating the integrated capabilities of Multimodal Large Language Models (MLLMs). Departing from previous benchmarks focusing on simplistic tasks and individual model capability, PCA-Bench introduces three complex scenarios: autonomous driving, domestic robotics, and open-world games. Given task instructions and diverse contexts, t…
▽ More
We present PCA-Bench, a multimodal decision-making benchmark for evaluating the integrated capabilities of Multimodal Large Language Models (MLLMs). Departing from previous benchmarks focusing on simplistic tasks and individual model capability, PCA-Bench introduces three complex scenarios: autonomous driving, domestic robotics, and open-world games. Given task instructions and diverse contexts, the model is required to seamlessly integrate multiple capabilities of Perception, Cognition, and Action in a reasoning chain to make accurate decisions. Moreover, PCA-Bench features error localization capabilities, scrutinizing model inaccuracies in areas such as perception, knowledge, or reasoning. This enhances the reliability of deploying MLLMs. To balance accuracy and efficiency in evaluation, we propose PCA-Eval, an automatic evaluation protocol, and assess 10 prevalent MLLMs. The results reveal significant performance disparities between open-source models and powerful proprietary models like GPT-4 Vision. To address this, we introduce Embodied-Instruction-Evolution (EIE), an automatic framework for synthesizing instruction tuning examples in multimodal embodied environments. EIE generates 7,510 training examples in PCA-Bench and enhances the performance of open-source MLLMs, occasionally surpassing GPT-4 Vision (+3\% in decision accuracy), thereby validating the effectiveness of EIE. Our findings suggest that robust MLLMs like GPT4-Vision show promise for decision-making in embodied agents, opening new avenues for MLLM research.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
Enhancing One-Shot Federated Learning Through Data and Ensemble Co-Boosting
Authors:
Rong Dai,
Yonggang Zhang,
Ang Li,
Tongliang Liu,
Xun Yang,
Bo Han
Abstract:
One-shot Federated Learning (OFL) has become a promising learning paradigm, enabling the training of a global server model via a single communication round. In OFL, the server model is aggregated by distilling knowledge from all client models (the ensemble), which are also responsible for synthesizing samples for distillation. In this regard, advanced works show that the performance of the server…
▽ More
One-shot Federated Learning (OFL) has become a promising learning paradigm, enabling the training of a global server model via a single communication round. In OFL, the server model is aggregated by distilling knowledge from all client models (the ensemble), which are also responsible for synthesizing samples for distillation. In this regard, advanced works show that the performance of the server model is intrinsically related to the quality of the synthesized data and the ensemble model. To promote OFL, we introduce a novel framework, Co-Boosting, in which synthesized data and the ensemble model mutually enhance each other progressively. Specifically, Co-Boosting leverages the current ensemble model to synthesize higher-quality samples in an adversarial manner. These hard samples are then employed to promote the quality of the ensemble model by adjusting the ensembling weights for each client model. Consequently, Co-Boosting periodically achieves high-quality data and ensemble models. Extensive experiments demonstrate that Co-Boosting can substantially outperform existing baselines under various settings. Moreover, Co-Boosting eliminates the need for adjustments to the client's local training, requires no additional data or model transmission, and allows client models to have heterogeneous architectures.
△ Less
Submitted 22 February, 2024;
originally announced February 2024.
-
Robust Training of Federated Models with Extremely Label Deficiency
Authors:
Yonggang Zhang,
Zhiqin Yang,
Xinmei Tian,
Nannan Wang,
Tongliang Liu,
Bo Han
Abstract:
Federated semi-supervised learning (FSSL) has emerged as a powerful paradigm for collaboratively training machine learning models using distributed data with label deficiency. Advanced FSSL methods predominantly focus on training a single model on each client. However, this approach could lead to a discrepancy between the objective functions of labeled and unlabeled data, resulting in gradient con…
▽ More
Federated semi-supervised learning (FSSL) has emerged as a powerful paradigm for collaboratively training machine learning models using distributed data with label deficiency. Advanced FSSL methods predominantly focus on training a single model on each client. However, this approach could lead to a discrepancy between the objective functions of labeled and unlabeled data, resulting in gradient conflicts. To alleviate gradient conflict, we propose a novel twin-model paradigm, called Twin-sight, designed to enhance mutual guidance by providing insights from different perspectives of labeled and unlabeled data. In particular, Twin-sight concurrently trains a supervised model with a supervised objective function while training an unsupervised model using an unsupervised objective function. To enhance the synergy between these two models, Twin-sight introduces a neighbourhood-preserving constraint, which encourages the preservation of the neighbourhood relationship among data features extracted by both models. Our comprehensive experiments on four benchmark datasets provide substantial evidence that Twin-sight can significantly outperform state-of-the-art methods across various experimental settings, demonstrating the efficacy of the proposed Twin-sight.
△ Less
Submitted 22 February, 2024;
originally announced February 2024.
-
Steep-spectrum AGN in eROSITA Final Equatorial-Depth Survey (eFEDS): Their host galaxies and multi-wavelength properties
Authors:
K. Iwasawa,
T. Liu,
Th. Boller,
J. Buchner,
J. Li,
T. Kawaguchi,
T. Nagao,
Y. Terashima,
Y. Toba,
J. D. Silverman,
R. Arcodia,
Th. Dauser,
M. Krumpe,
K. Nandra,
J. Wilms
Abstract:
We selected sources with a steep soft-X-ray-band spectrum with a photon index larger than 2.5 -- measured by eROSITA on board the Spectrum-Roentgen-Gamma (SRG) -- from the eFEDS AGN catalogue as candidates of highly accreting supermassive black holes, and investigated their multi-wavelength properties. Among 601 bright AGN with 0.2-5 keV counts of greater than 100, 83 sources (~14%) are classified…
▽ More
We selected sources with a steep soft-X-ray-band spectrum with a photon index larger than 2.5 -- measured by eROSITA on board the Spectrum-Roentgen-Gamma (SRG) -- from the eFEDS AGN catalogue as candidates of highly accreting supermassive black holes, and investigated their multi-wavelength properties. Among 601 bright AGN with 0.2-5 keV counts of greater than 100, 83 sources (~14%) are classified as steep-spectrum sources. These sources have typical 0.5-2 keV luminosities of L(SX) ~ 1e44 erg/s and the majority of them are found at redshifts below z=1. In comparison with sources with flatter spectra, these sources have, on average, a UV (or optical) to 2 keV luminosity ratio that is larger by ~0.3 dex and bluer optical-to-UV continuum emission. They also appear to be radio quiet based on the detection rate in the FIRST and VLASS surveys. Their host galaxies -- at least in the redshift range of z=0.2-0.8, where the AGN-galaxy decomposition results from the Subaru Hyper Suprime-Cam imaging are available -- tend to be late-type and have smaller stellar masses than those of sources with flatter spectra. These properties are similar to those found in nearby narrow-line Seyfert 1 galaxies, in agreement with the picture that they are AGN with elevated accretion rates and are in the early growth phase of black hole and galaxy co-evolution. However, the steep-spectrum sources are not exclusively narrow-line Seyfert 1 galaxies; indeed many are broad-line Seyfert 1 galaxies, as found by a catalogue search. This suggests that these steep-spectrum sources may be black holes generally with high accretion rates but of a wide mass range, including a few objects emitting at L(SX)>1e45 erg/s, of which black hole masses can be close to 10^9 M_sun.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
PI-CoF: A Bilevel Optimization Framework for Solving Active Learning Problems using Physics-Information
Authors:
Liqiu Dong,
Marta Zagorowska,
Tong Liu,
Alex Durkin,
Mehmet Mercangöz
Abstract:
Physics informed neural networks (PINNs) have recently been proposed as surrogate models for solving process optimization problems. However, in an active learning setting collecting enough data for reliably training PINNs poses a challenge. This study proposes a broadly applicable method for incorporating physics information into existing machine learning (ML) models of any type. The proposed meth…
▽ More
Physics informed neural networks (PINNs) have recently been proposed as surrogate models for solving process optimization problems. However, in an active learning setting collecting enough data for reliably training PINNs poses a challenge. This study proposes a broadly applicable method for incorporating physics information into existing machine learning (ML) models of any type. The proposed method - referred to as PI-CoF for Physics-Informed Correction Factors - introduces additive or multiplicative correction factors for pointwise inference, which are identified by solving a regularized unconstrained optimization problem for reconciliation of physics information and ML model predictions. When ML models are used in an optimization context, using the proposed approach translates into a bilevel optimization problem, where the reconciliation problem is solved as an inner problem each time before evaluating the objective and constraint functions of the outer problem. The utility of the proposed approach is demonstrated through a numerical example, emphasizing constraint satisfaction in a safe Bayesian optimization (BO) setting. Furthermore, a simulation study is carried out by using PI-CoF for the real-time optimization of a fuel cell system. The results show reduced fuel consumption and better reference tracking performance when using the proposed PI-CoF approach in comparison to a constrained BO algorithm not using physics information.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
Thermal transport in a 2D amorphous material
Authors:
Yuxi Wang,
Xingxing Zhang,
Wujuan Yan,
Nianjie Liang,
Haiyu He,
Xinwei Tao,
Ang Li,
Fuwei Yang,
Buxuan Li,
Te-Huan Liu,
Jia Zhu,
Wu Zhou,
Wei Wang,
Lin Zhou,
Bai Song
Abstract:
Two-dimensional (2D) crystals proved revolutionary soon after graphene was discovered in 2004. However, 2D amorphous materials only became accessible in 2020 and remain largely unexplored. In particular, the thermophysical properties of amorphous materials are of great interest upon transition from 3D to 2D. Here, we probe thermal transport in 2D amorphous carbon. A cross-plane thermal conductivit…
▽ More
Two-dimensional (2D) crystals proved revolutionary soon after graphene was discovered in 2004. However, 2D amorphous materials only became accessible in 2020 and remain largely unexplored. In particular, the thermophysical properties of amorphous materials are of great interest upon transition from 3D to 2D. Here, we probe thermal transport in 2D amorphous carbon. A cross-plane thermal conductivity ($κ$) down to 0.079 $\rm{Wm}^{-1}K^{-1}$ is measured for van der Waals stacked multilayers at room temperature, which is among the lowest reported to date. Meanwhile, an unexpectedly high in-plane $κ$ is obtained for freestanding monolayers which is a few times larger than what is predicted by conventional wisdom for 3D amorphous carbon with similar $\rm{sp}^{2}$ fraction. Our molecular dynamics simulations reveal the role of disorder and highlight the impact of dimensionality. Amorphous materials at the 2D limit open up new avenues for understanding and manipulating heat at the atomic scale.
△ Less
Submitted 22 March, 2024; v1 submitted 20 February, 2024;
originally announced February 2024.
-
Federated Causal Discovery from Heterogeneous Data
Authors:
Loka Li,
Ignavier Ng,
Gongxu Luo,
Biwei Huang,
Guangyi Chen,
Tongliang Liu,
Bin Gu,
Kun Zhang
Abstract:
Conventional causal discovery methods rely on centralized data, which is inconsistent with the decentralized nature of data in many real-world situations. This discrepancy has motivated the development of federated causal discovery (FCD) approaches. However, existing FCD methods may be limited by their potentially restrictive assumptions of identifiable functional causal models or homogeneous data…
▽ More
Conventional causal discovery methods rely on centralized data, which is inconsistent with the decentralized nature of data in many real-world situations. This discrepancy has motivated the development of federated causal discovery (FCD) approaches. However, existing FCD methods may be limited by their potentially restrictive assumptions of identifiable functional causal models or homogeneous data distributions, narrowing their applicability in diverse scenarios. In this paper, we propose a novel FCD method attempting to accommodate arbitrary causal models and heterogeneous data. We first utilize a surrogate variable corresponding to the client index to account for the data heterogeneity across different clients. We then develop a federated conditional independence test (FCIT) for causal skeleton discovery and establish a federated independent change principle (FICP) to determine causal directions. These approaches involve constructing summary statistics as a proxy of the raw data to protect data privacy. Owing to the nonparametric properties, FCIT and FICP make no assumption about particular functional forms, thereby facilitating the handling of arbitrary causal models. We conduct extensive experiments on synthetic and real datasets to show the efficacy of our method. The code is available at https://github.com/lokali/FedCDH.git.
△ Less
Submitted 26 February, 2024; v1 submitted 20 February, 2024;
originally announced February 2024.
-
VideoPrism: A Foundational Visual Encoder for Video Understanding
Authors:
Long Zhao,
Nitesh B. Gundavarapu,
Liangzhe Yuan,
Hao Zhou,
Shen Yan,
Jennifer J. Sun,
Luke Friedman,
Rui Qian,
Tobias Weyand,
Yue Zhao,
Rachel Hornung,
Florian Schroff,
Ming-Hsuan Yang,
David A. Ross,
Huisheng Wang,
Hartwig Adam,
Mikhail Sirotenko,
Ting Liu,
Boqing Gong
Abstract:
We introduce VideoPrism, a general-purpose video encoder that tackles diverse video understanding tasks with a single frozen model. We pretrain VideoPrism on a heterogeneous corpus containing 36M high-quality video-caption pairs and 582M video clips with noisy parallel text (e.g., ASR transcripts). The pretraining approach improves upon masked autoencoding by global-local distillation of semantic…
▽ More
We introduce VideoPrism, a general-purpose video encoder that tackles diverse video understanding tasks with a single frozen model. We pretrain VideoPrism on a heterogeneous corpus containing 36M high-quality video-caption pairs and 582M video clips with noisy parallel text (e.g., ASR transcripts). The pretraining approach improves upon masked autoencoding by global-local distillation of semantic video embeddings and a token shuffling scheme, enabling VideoPrism to focus primarily on the video modality while leveraging the invaluable text associated with videos. We extensively test VideoPrism on four broad groups of video understanding tasks, from web video question answering to CV for science, achieving state-of-the-art performance on 31 out of 33 video understanding benchmarks.
△ Less
Submitted 15 June, 2024; v1 submitted 20 February, 2024;
originally announced February 2024.
-
Deciphering the Impact of Pretraining Data on Large Language Models through Machine Unlearning
Authors:
Yang Zhao,
Li Du,
Xiao Ding,
Kai Xiong,
Zhouhao Sun,
Jun Shi,
Ting Liu,
Bing Qin
Abstract:
Through pretraining on a corpus with various sources, Large Language Models (LLMs) have gained impressive performance. However, the impact of each component of the pretraining corpus remains opaque. As a result, the organization of the pretraining corpus is still empirical and may deviate from the optimal. To address this issue, we systematically analyze the impact of 48 datasets from 5 major cate…
▽ More
Through pretraining on a corpus with various sources, Large Language Models (LLMs) have gained impressive performance. However, the impact of each component of the pretraining corpus remains opaque. As a result, the organization of the pretraining corpus is still empirical and may deviate from the optimal. To address this issue, we systematically analyze the impact of 48 datasets from 5 major categories of pretraining data of LLMs and measure their impacts on LLMs using benchmarks about nine major categories of model capabilities. Our analyses provide empirical results about the contribution of multiple corpora on the performances of LLMs, along with their joint impact patterns, including complementary, orthogonal, and correlational relationships. We also identify a set of ``high-impact data'' such as Books that is significantly related to a set of model capabilities. These findings provide insights into the organization of data to support more efficient pretraining of LLMs.
△ Less
Submitted 26 March, 2024; v1 submitted 18 February, 2024;
originally announced February 2024.
-
Reasoning before Comparison: LLM-Enhanced Semantic Similarity Metrics for Domain Specialized Text Analysis
Authors:
Shaochen Xu,
Zihao Wu,
Huaqin Zhao,
Peng Shu,
Zhengliang Liu,
Wenxiong Liao,
Sheng Li,
Andrea Sikora,
Tianming Liu,
Xiang Li
Abstract:
In this study, we leverage LLM to enhance the semantic analysis and develop similarity metrics for texts, addressing the limitations of traditional unsupervised NLP metrics like ROUGE and BLEU. We develop a framework where LLMs such as GPT-4 are employed for zero-shot text identification and label generation for radiology reports, where the labels are then used as measurements for text similarity.…
▽ More
In this study, we leverage LLM to enhance the semantic analysis and develop similarity metrics for texts, addressing the limitations of traditional unsupervised NLP metrics like ROUGE and BLEU. We develop a framework where LLMs such as GPT-4 are employed for zero-shot text identification and label generation for radiology reports, where the labels are then used as measurements for text similarity. By testing the proposed framework on the MIMIC data, we find that GPT-4 generated labels can significantly improve the semantic similarity assessment, with scores more closely aligned with clinical ground truth than traditional NLP metrics. Our work demonstrates the possibility of conducting semantic analysis of the text data using semi-quantitative reasoning results by the LLMs for highly specialized domains. While the framework is implemented for radiology report similarity analysis, its concept can be extended to other specialized domains as well.
△ Less
Submitted 20 February, 2024; v1 submitted 17 February, 2024;
originally announced February 2024.
-
Search for the production of deuterons and antideuterons in e^+e^- annihilation at center-of-mass energies between 4.13 and 4.70 GeV
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann
, et al. (593 additional authors not shown)
Abstract:
Using a data sample of $e^+e^-$ collision data corresponding to an integrated luminosity of 19 fb$^{-1}$ collected with the BESIII detector at the BEPCII collider, we search for the production of deuterons and antideuterons via $e^+e^-\to ppπ^-\bar{d}+c.c.$ for the first time at center-of-mass energies between 4.13 and 4.70 GeV. No significant signal is observed and the upper limit of the…
▽ More
Using a data sample of $e^+e^-$ collision data corresponding to an integrated luminosity of 19 fb$^{-1}$ collected with the BESIII detector at the BEPCII collider, we search for the production of deuterons and antideuterons via $e^+e^-\to ppπ^-\bar{d}+c.c.$ for the first time at center-of-mass energies between 4.13 and 4.70 GeV. No significant signal is observed and the upper limit of the $e^+e^-\to ppπ^-\bar{d}+c.c.$ cross section is determined to be from 9.0 to 145 fb depending on the center-of-mass energy at the $90\%$ confidence level.
△ Less
Submitted 17 February, 2024;
originally announced February 2024.
-
Measuring and Controlling Instruction (In)Stability in Language Model Dialogs
Authors:
Kenneth Li,
Tianle Liu,
Naomi Bashkansky,
David Bau,
Fernanda Viégas,
Hanspeter Pfister,
Martin Wattenberg
Abstract:
System-prompting is a standard tool for customizing language-model chatbots, enabling them to follow a specific instruction. An implicit assumption in the use of system prompts is that they will be stable, so the chatbot will continue to generate text according to the stipulated instructions for the duration of a conversation. We propose a quantitative benchmark to test this assumption, evaluating…
▽ More
System-prompting is a standard tool for customizing language-model chatbots, enabling them to follow a specific instruction. An implicit assumption in the use of system prompts is that they will be stable, so the chatbot will continue to generate text according to the stipulated instructions for the duration of a conversation. We propose a quantitative benchmark to test this assumption, evaluating instruction stability via self-chats between two instructed chatbots. Testing popular models like LLaMA2-chat-70B and GPT-3.5, we reveal a significant instruction drift within eight rounds of conversations. An empirical and theoretical analysis of this phenomenon suggests the transformer attention mechanism plays a role, due to attention decay over long exchanges. To combat attention decay and instruction drift, we propose a lightweight method called split-softmax, which compares favorably against two strong baselines.
△ Less
Submitted 1 May, 2024; v1 submitted 13 February, 2024;
originally announced February 2024.
-
Exploring Hybrid Question Answering via Program-based Prompting
Authors:
Qi Shi,
Han Cui,
Haofeng Wang,
Qingfu Zhu,
Wanxiang Che,
Ting Liu
Abstract:
Question answering over heterogeneous data requires reasoning over diverse sources of data, which is challenging due to the large scale of information and organic coupling of heterogeneous data. Various approaches have been proposed to address these challenges. One approach involves training specialized retrievers to select relevant information, thereby reducing the input length. Another approach…
▽ More
Question answering over heterogeneous data requires reasoning over diverse sources of data, which is challenging due to the large scale of information and organic coupling of heterogeneous data. Various approaches have been proposed to address these challenges. One approach involves training specialized retrievers to select relevant information, thereby reducing the input length. Another approach is to transform diverse modalities of data into a single modality, simplifying the task difficulty and enabling more straightforward processing. In this paper, we propose HProPro, a novel program-based prompting framework for the hybrid question answering task. HProPro follows the code generation and execution paradigm. In addition, HProPro integrates various functions to tackle the hybrid reasoning scenario. Specifically, HProPro contains function declaration and function implementation to perform hybrid information-seeking over data from various sources and modalities, which enables reasoning over such data without training specialized retrievers or performing modal transformations. Experimental results on two typical hybrid question answering benchmarks HybridQA and MultiModalQA demonstrate the effectiveness of HProPro: it surpasses all baseline systems and achieves the best performances in the few-shot settings on both datasets.
△ Less
Submitted 16 February, 2024;
originally announced February 2024.
-
Inadequacies of Large Language Model Benchmarks in the Era of Generative Artificial Intelligence
Authors:
Timothy R. McIntosh,
Teo Susnjak,
Tong Liu,
Paul Watters,
Malka N. Halgamuge
Abstract:
The rapid rise in popularity of Large Language Models (LLMs) with emerging capabilities has spurred public curiosity to evaluate and compare different LLMs, leading many researchers to propose their LLM benchmarks. Noticing preliminary inadequacies in those benchmarks, we embarked on a study to critically assess 23 state-of-the-art LLM benchmarks, using our novel unified evaluation framework throu…
▽ More
The rapid rise in popularity of Large Language Models (LLMs) with emerging capabilities has spurred public curiosity to evaluate and compare different LLMs, leading many researchers to propose their LLM benchmarks. Noticing preliminary inadequacies in those benchmarks, we embarked on a study to critically assess 23 state-of-the-art LLM benchmarks, using our novel unified evaluation framework through the lenses of people, process, and technology, under the pillars of functionality and security. Our research uncovered significant limitations, including biases, difficulties in measuring genuine reasoning, adaptability, implementation inconsistencies, prompt engineering complexity, evaluator diversity, and the overlooking of cultural and ideological norms in one comprehensive assessment. Our discussions emphasized the urgent need for standardized methodologies, regulatory certainties, and ethical guidelines in light of Artificial Intelligence (AI) advancements, including advocating for an evolution from static benchmarks to dynamic behavioral profiling to accurately capture LLMs' complex behaviors and potential risks. Our study highlighted the necessity for a paradigm shift in LLM evaluation methodologies, underlining the importance of collaborative efforts for the development of universally accepted benchmarks and the enhancement of AI systems' integration into society.
△ Less
Submitted 15 February, 2024;
originally announced February 2024.
-
Pheno-Robot: An Auto-Digital Modelling System for In-Situ Phenoty** in the Field
Authors:
Yaoqiang Pan,
Kewei Hu,
Tianhao Liu,
Chao Chen,
Hanwen Kang
Abstract:
Accurate reconstruction of plant models for phenoty** analysis is critical for optimising sustainable agricultural practices in precision agriculture. Traditional laboratory-based phenoty**, while valuable, falls short of understanding how plants grow under uncontrolled conditions. Robotic technologies offer a promising avenue for large-scale, direct phenoty** in real-world environments. Thi…
▽ More
Accurate reconstruction of plant models for phenoty** analysis is critical for optimising sustainable agricultural practices in precision agriculture. Traditional laboratory-based phenoty**, while valuable, falls short of understanding how plants grow under uncontrolled conditions. Robotic technologies offer a promising avenue for large-scale, direct phenoty** in real-world environments. This study explores the deployment of emerging robotics and digital technology in plant phenoty** to improve performance and efficiency. Three critical functional modules: environmental understanding, robotic motion planning, and in-situ phenoty**, are introduced to automate the entire process. Experimental results demonstrate the effectiveness of the system in agricultural environments. The pheno-robot system autonomously collects high-quality data by navigating around plants. In addition, the in-situ modelling model reconstructs high-quality plant models from the data collected by the robot. The developed robotic system shows high efficiency and robustness, demonstrating its potential to advance plant science in real-world agricultural environments.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
Open-Vocabulary Segmentation with Unpaired Mask-Text Supervision
Authors:
Zhaoqing Wang,
Xiaobo Xia,
Ziye Chen,
Xiao He,
Yandong Guo,
Mingming Gong,
Tongliang Liu
Abstract:
Current state-of-the-art open-vocabulary segmentation methods typically rely on image-mask-text triplet annotations for supervision. However, acquiring such detailed annotations is labour-intensive and poses scalability challenges in complex real-world scenarios. While existing weakly-supervised approaches leverage image-text pairs to reduce the expansive annotation cost, the lack of mask supervis…
▽ More
Current state-of-the-art open-vocabulary segmentation methods typically rely on image-mask-text triplet annotations for supervision. However, acquiring such detailed annotations is labour-intensive and poses scalability challenges in complex real-world scenarios. While existing weakly-supervised approaches leverage image-text pairs to reduce the expansive annotation cost, the lack of mask supervision makes it difficult for the model to locate multiple instances and accurately group pixels with similar semantics, significantly hampering versatility and performance. In this paper, we introduce Unpair-Seg, a novel weakly-supervised open-vocabulary segmentation framework that learns from unpaired image-mask and image-text pairs, which can be independently and efficiently collected. Unpair-Seg initially predicts a set of binary masks and generates pseudo labels by identifying confident pairs of masks and text entities. We then train a feature adapter to align region embeddings with text embeddings based on these pseudo labels, achieving open-vocabulary segmentation. However, the inherent noise in the mask-entity correspondence poses a challenge to obtaining reliable pairs. To address this, we employ a vision-language large model to re-caption the input images and extract precise entities, and we design a multi-scale matching strategy to reduce noisy mask-entity pairs. Our Unpair-Seg framework demonstrates impressive performance, achieving 14.6\% and 19.5\% mIoU on the ADE-847 and PASCAL Context-459 datasets, significantly narrowing the gap between fully-supervised and weakly-supervised methods.
△ Less
Submitted 11 June, 2024; v1 submitted 14 February, 2024;
originally announced February 2024.
-
Interpretable Measures of Conceptual Similarity by Complexity-Constrained Descriptive Auto-Encoding
Authors:
Alessandro Achille,
Greg Ver Steeg,
Tian Yu Liu,
Matthew Trager,
Carson Klingenberg,
Stefano Soatto
Abstract:
Quantifying the degree of similarity between images is a key copyright issue for image-based machine learning. In legal doctrine however, determining the degree of similarity between works requires subjective analysis, and fact-finders (judges and juries) can demonstrate considerable variability in these subjective judgement calls. Images that are structurally similar can be deemed dissimilar, whe…
▽ More
Quantifying the degree of similarity between images is a key copyright issue for image-based machine learning. In legal doctrine however, determining the degree of similarity between works requires subjective analysis, and fact-finders (judges and juries) can demonstrate considerable variability in these subjective judgement calls. Images that are structurally similar can be deemed dissimilar, whereas images of completely different scenes can be deemed similar enough to support a claim of copying. We seek to define and compute a notion of "conceptual similarity" among images that captures high-level relations even among images that do not share repeated elements or visually similar components. The idea is to use a base multi-modal model to generate "explanations" (captions) of visual data at increasing levels of complexity. Then, similarity can be measured by the length of the caption needed to discriminate between the two images: Two highly dissimilar images can be discriminated early in their description, whereas conceptually dissimilar ones will need more detail to be distinguished. We operationalize this definition and show that it correlates with subjective (averaged human evaluation) assessment, and beats existing baselines on both image-to-image and text-to-text similarity benchmarks. Beyond just providing a number, our method also offers interpretability by pointing to the specific level of granularity of the description where the source data are differentiated.
△ Less
Submitted 13 February, 2024;
originally announced February 2024.
-
Characterization of the ATLAS Liquid Argon Front-End ASIC ALFE2 for the HL-LHC upgrade
Authors:
D. Matakias,
G. Carini,
H. Chen,
M. Dabrowski,
G. Deptuch,
L. Duflot,
J. Kierstead,
T. Liu,
H. Ma,
N. Morange,
S. Rescia,
S. Tang,
H. Xu
Abstract:
ALFE2 is an ATLAS Liquid Argon Calorimeter (LAr) Front-End ASIC designed for the HL-LHC upgrade. ALFE2 comprises four channels of pre-amplifiers and CR-(RC)2 shapers with adjustable input impedance. ALFE2 features two separate gain outputs to provide 16-bit dynamic-range coverage and an optimum resolution. ALFE2 is characterized using a Front-End Test Board (FETB) based on a Zynq UltraScale+ MPSoC…
▽ More
ALFE2 is an ATLAS Liquid Argon Calorimeter (LAr) Front-End ASIC designed for the HL-LHC upgrade. ALFE2 comprises four channels of pre-amplifiers and CR-(RC)2 shapers with adjustable input impedance. ALFE2 features two separate gain outputs to provide 16-bit dynamic-range coverage and an optimum resolution. ALFE2 is characterized using a Front-End Test Board (FETB) based on a Zynq UltraScale+ MPSoC and two octal-channel 16-bit high-speed ADCs. The test results indicate that ALFE2 fulfills or greatly exceeds all specifications on gain, noise, linearity, uniformity, and radiation tolerance.
△ Less
Submitted 13 February, 2024;
originally announced February 2024.
-
The SRG/eROSITA All-Sky Survey: Cosmology Constraints from Cluster Abundances in the Western Galactic Hemisphere
Authors:
V. Ghirardini,
E. Bulbul,
E. Artis,
N. Clerc,
C. Garrel,
S. Grandis,
M. Kluge,
A. Liu,
Y. E. Bahar,
F. Balzer,
I. Chiu,
J. Comparat,
D. Gruen,
F. Kleinebreil,
S. Krippendorf,
A. Merloni,
K. Nandra,
N. Okabe,
F. Pacaud,
P. Predehl,
M. E. Ramos-Ceja,
T. H. Reiprich,
J. S. Sanders,
T. Schrabback,
R. Seppi
, et al. (24 additional authors not shown)
Abstract:
The cluster mass function traces the growth of linear density perturbations and provides valuable insights into the growth of structures, the nature of dark matter, and the cosmological parameters governing the Universe. The primary science goal of eROSITA, on board the {\it Spectrum Roentgen Gamma (SRG)} mission, launched in 2019, is to constrain cosmology through the evolution of cluster mass fu…
▽ More
The cluster mass function traces the growth of linear density perturbations and provides valuable insights into the growth of structures, the nature of dark matter, and the cosmological parameters governing the Universe. The primary science goal of eROSITA, on board the {\it Spectrum Roentgen Gamma (SRG)} mission, launched in 2019, is to constrain cosmology through the evolution of cluster mass function. In this paper, we present the cosmological constraints obtained from 5259 clusters of galaxies detected over an area of 12791~deg$^2$ in the Western Galactic Hemisphere of the eROSITA's first All-Sky Survey (eRASS1). The common footprint region between the eROSITA Survey and DES, KiDS, and HSC surveys is used for calibration of the scaling between X-ray count rate and their total mass through measurements of their weak gravitational lensing signal. eRASS1 cluster abundances constrain the $Λ$CDM parameters, which are the energy density of the total matter to $Ω_{\mathrm{m}}=0.29^{+0.01}_{-0.02}$, and the normalization of the density fluctuations to $σ_8=0.88\pm0.02$ and their combination yields $S_8=σ_8 (Ω_\mathrm{m} / 0.3)^{0.5}=0.86\pm0.01$, consistent and at a similar precision with the state-of-the-art CMB measurements. eRASS1 cosmological experiment places a most stringent upper limit on the summed masses of left-handed light neutrinos to $\sum m_ν< 0.22\mathrm{~eV}$ (95\% confidence interval). Combining eRASS1 cluster abundance measurements with CMB and ground-based neutrino oscillation experiments, we measure the summed neutrino masses to be $\sum m_ν=0.08_{-0.02}^{+0.03}\mathrm{~eV}$ or $\sum m_ν=0.12_{-0.01}^{+0.03}\mathrm{~eV}$ depending on the mass hierarchy scenario for neutrino eigenstates. eRASS1 cluster abundances significantly improve the constraints on the dark energy equation of state parameter to $w=-1.12\pm0.12$. (ABRIDGED)
△ Less
Submitted 13 February, 2024;
originally announced February 2024.
-
Particle Filter SLAM for Vehicle Localization
Authors:
Tianrui Liu,
Changxin Xu,
Yuxin Qiao,
Chufeng Jiang,
Jiqiang Yu
Abstract:
Simultaneous Localization and Map** (SLAM) presents a formidable challenge in robotics, involving the dynamic construction of a map while concurrently determining the precise location of the robotic agent within an unfamiliar environment. This intricate task is further compounded by the inherent "chicken-and-egg" dilemma, where accurate map** relies on a dependable estimation of the robot's lo…
▽ More
Simultaneous Localization and Map** (SLAM) presents a formidable challenge in robotics, involving the dynamic construction of a map while concurrently determining the precise location of the robotic agent within an unfamiliar environment. This intricate task is further compounded by the inherent "chicken-and-egg" dilemma, where accurate map** relies on a dependable estimation of the robot's location, and vice versa. Moreover, the computational intensity of SLAM adds an additional layer of complexity, making it a crucial yet demanding topic in the field. In our research, we address the challenges of SLAM by adopting the Particle Filter SLAM method. Our approach leverages encoded data and fiber optic gyro (FOG) information to enable precise estimation of vehicle motion, while lidar technology contributes to environmental perception by providing detailed insights into surrounding obstacles. The integration of these data streams culminates in the establishment of a Particle Filter SLAM framework, representing a key endeavor in this paper to effectively navigate and overcome the complexities associated with simultaneous localization and map** in robotic systems.
△ Less
Submitted 19 February, 2024; v1 submitted 12 February, 2024;
originally announced February 2024.