-
CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents
Authors:
Tianqi Xu,
Linyao Chen,
Dai-Jie Wu,
Yanjun Chen,
Zecheng Zhang,
Xiang Yao,
Zhiqiang Xie,
Yongchao Chen,
Shilong Liu,
Bochen Qian,
Philip Torr,
Bernard Ghanem,
Guohao Li
Abstract:
The development of autonomous agents increasingly relies on Multimodal Language Models (MLMs) to perform tasks described in natural language with GUI environments, such as websites, desktop computers, or mobile phones. Existing benchmarks for MLM agents in interactive environments are limited by their focus on a single environment, lack of detailed and generalized evaluation methods, and the compl…
▽ More
The development of autonomous agents increasingly relies on Multimodal Language Models (MLMs) to perform tasks described in natural language with GUI environments, such as websites, desktop computers, or mobile phones. Existing benchmarks for MLM agents in interactive environments are limited by their focus on a single environment, lack of detailed and generalized evaluation methods, and the complexities of constructing tasks and evaluators. To overcome these limitations, we introduce Crab, the first agent benchmark framework designed to support cross-environment tasks, incorporating a graph-based fine-grained evaluation method and an efficient mechanism for task and evaluator construction. Our framework supports multiple devices and can be easily extended to any environment with a Python interface. Leveraging Crab, we developed a cross-platform Crab Benchmark-v0 comprising 100 tasks in computer desktop and mobile phone environments. We evaluated four advanced MLMs using different single and multi-agent system configurations on this benchmark. The experimental results demonstrate that the single agent with GPT-4o achieves the best completion ratio of 35.26%. All framework code, agent code, and task datasets are publicly available at https://github.com/camel-ai/crab.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
DogeRM: Equip** Reward Models with Domain Knowledge through Model Merging
Authors:
Tzu-Han Lin,
Chen-An Li,
Hung-yi Lee,
Yun-Nung Chen
Abstract:
Reinforcement learning from human feedback (RLHF) is a popular strategy for aligning large language models (LLMs) with desired behaviors. Reward modeling is a crucial step in RLHF. However, collecting paired preference data for training reward models is often costly and time-consuming, especially for domain-specific preferences requiring expert annotation. To address this challenge, we propose the…
▽ More
Reinforcement learning from human feedback (RLHF) is a popular strategy for aligning large language models (LLMs) with desired behaviors. Reward modeling is a crucial step in RLHF. However, collecting paired preference data for training reward models is often costly and time-consuming, especially for domain-specific preferences requiring expert annotation. To address this challenge, we propose the \textbf{Do}main knowled\textbf{ge} merged \textbf{R}eward \textbf{M}odel (DogeRM), a novel framework that integrates domain-specific knowledge into a general reward model by model merging. The experiments demonstrate that DogeRM enhances performance across different benchmarks and provide a detailed analysis showcasing the effects of model merging, showing the great potential of facilitating model alignment.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Learning data efficient coarse-grained molecular dynamics from forces and noise
Authors:
Aleksander E. P. Durumeric,
Yaoyi Chen,
Frank Noé,
Cecilia Clementi
Abstract:
Machine-learned coarse-grained (MLCG) molecular dynamics is a promising option for modeling biomolecules. However, MLCG models currently require large amounts of data from reference atomistic molecular dynamics or substantial computation for training. Denoising score matching -- the technology behind the widely popular diffusion models -- has simultaneously emerged as a machine-learning framework…
▽ More
Machine-learned coarse-grained (MLCG) molecular dynamics is a promising option for modeling biomolecules. However, MLCG models currently require large amounts of data from reference atomistic molecular dynamics or substantial computation for training. Denoising score matching -- the technology behind the widely popular diffusion models -- has simultaneously emerged as a machine-learning framework for creating samples from noise. Models in the first category are often trained using atomistic forces, while those in the second category extract the data distribution by reverting noise-based corruption. We unify these approaches to improve the training of MLCG force-fields, reducing data requirements by a factor of 100 while maintaining advantages typical to force-based parameterization. The methods are demonstrated on proteins Trp-Cage and NTL9 and published as open-source code.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Evolutionary Morphology Towards Overconstrained Locomotion via Large-Scale, Multi-Terrain Deep Reinforcement Learning
Authors:
Yenan Chen,
Chuye Zhang,
Pengxi Gu,
Jianuo Qiu,
Jiayi Yin,
Nuofan Qiu,
Guo**g Huang,
Bangchao Huang,
Zishang Zhang,
Hui Deng,
Wei Zhang,
Fang Wan,
Chaoyang Song
Abstract:
While the animals' Fin-to-Limb evolution has been well-researched in biology, such morphological transformation remains under-adopted in the modern design of advanced robotic limbs. This paper investigates a novel class of overconstrained locomotion from a design and learning perspective inspired by evolutionary morphology, aiming to integrate the concept of `intelligent design under constraints'…
▽ More
While the animals' Fin-to-Limb evolution has been well-researched in biology, such morphological transformation remains under-adopted in the modern design of advanced robotic limbs. This paper investigates a novel class of overconstrained locomotion from a design and learning perspective inspired by evolutionary morphology, aiming to integrate the concept of `intelligent design under constraints' - hereafter referred to as constraint-driven design intelligence - in develo** modern robotic limbs with superior energy efficiency. We propose a 3D-printable design of robotic limbs parametrically reconfigurable as a classical planar 4-bar linkage, an overconstrained Bennett linkage, and a spherical 4-bar linkage. These limbs adopt a co-axial actuation, identical to the modern legged robot platforms, with the added capability of upgrading into a wheel-legged system. Then, we implemented a large-scale, multi-terrain deep reinforcement learning framework to train these reconfigurable limbs for a comparative analysis of overconstrained locomotion in energy efficiency. Results show that the overconstrained limbs exhibit more efficient locomotion than planar limbs during forward and sideways walking over different terrains, including floors, slopes, and stairs, with or without random noises, by saving at least 22% mechanical energy in completing the traverse task, with the spherical limbs being the least efficient. It also achieves the highest average speed of 0.85 meters per second on flat terrain, which is 20% faster than the planar limbs. This study paves the path for an exciting direction for future research in overconstrained robotics leveraging evolutionary morphology and reconfigurable mechanism intelligence when combined with state-of-the-art methods in deep reinforcement learning.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
MARLP: Time-series Forecasting Control for Agricultural Managed Aquifer Recharge
Authors:
Yuning Chen,
Kang Yang,
Zhiyu An,
Brady Holder,
Luke Paloutzian,
Khaled Bali,
Wan Du
Abstract:
The rapid decline in groundwater around the world poses a significant challenge to sustainable agriculture. To address this issue, agricultural managed aquifer recharge (Ag-MAR) is proposed to recharge the aquifer by artificially flooding agricultural lands using surface water. Ag-MAR requires a carefully selected flooding schedule to avoid affecting the oxygen absorption of crop roots. However, c…
▽ More
The rapid decline in groundwater around the world poses a significant challenge to sustainable agriculture. To address this issue, agricultural managed aquifer recharge (Ag-MAR) is proposed to recharge the aquifer by artificially flooding agricultural lands using surface water. Ag-MAR requires a carefully selected flooding schedule to avoid affecting the oxygen absorption of crop roots. However, current Ag-MAR scheduling does not take into account complex environmental factors such as weather and soil oxygen, resulting in crop damage and insufficient recharging amounts. This paper proposes MARLP, the first end-to-end data-driven control system for Ag-MAR. We first formulate Ag-MAR as an optimization problem. To that end, we analyze four-year in-field datasets, which reveal the multi-periodicity feature of the soil oxygen level trends and the opportunity to use external weather forecasts and flooding proposals as exogenous clues for soil oxygen prediction. Then, we design a two-stage forecasting framework. In the first stage, it extracts both the cross-variate dependency and the periodic patterns from historical data to conduct preliminary forecasting. In the second stage, it uses weather-soil and flooding-soil causality to facilitate an accurate prediction of soil oxygen levels. Finally, we conduct model predictive control (MPC) for Ag-MAR flooding. To address the challenge of large action spaces, we devise a heuristic planning module to reduce the number of flooding proposals to enable the search for optimal solutions. Real-world experiments show that MARLP reduces the oxygen deficit ratio by 86.8% while improving the recharging amount in unit time by 35.8%, compared with the previous four years.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Tokenize the World into Object-level Knowledge to Address Long-tail Events in Autonomous Driving
Authors:
Ran Tian,
Boyi Li,
Xinshuo Weng,
Yuxiao Chen,
Edward Schmerling,
Yue Wang,
Boris Ivanovic,
Marco Pavone
Abstract:
The autonomous driving industry is increasingly adopting end-to-end learning from sensory inputs to minimize human biases in system design. Traditional end-to-end driving models, however, suffer from long-tail events due to rare or unseen inputs within their training distributions. To address this, we propose TOKEN, a novel Multi-Modal Large Language Model (MM-LLM) that tokenizes the world into ob…
▽ More
The autonomous driving industry is increasingly adopting end-to-end learning from sensory inputs to minimize human biases in system design. Traditional end-to-end driving models, however, suffer from long-tail events due to rare or unseen inputs within their training distributions. To address this, we propose TOKEN, a novel Multi-Modal Large Language Model (MM-LLM) that tokenizes the world into object-level knowledge, enabling better utilization of LLM's reasoning capabilities to enhance autonomous vehicle planning in long-tail scenarios. TOKEN effectively alleviates data scarcity and inefficient tokenization by leveraging a traditional end-to-end driving model to produce condensed and semantically enriched representations of the scene, which are optimized for LLM planning compatibility through deliberate representation and reasoning alignment training stages. Our results demonstrate that TOKEN excels in grounding, reasoning, and planning capabilities, outperforming existing frameworks with a 27% reduction in trajectory L2 error and a 39% decrease in collision rates in long-tail scenarios. Additionally, our work highlights the importance of representation alignment and structured reasoning in sparking the common-sense reasoning capabilities of MM-LLMs for effective planning.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Influence of fluid rheology on multistability in the unstable flow of polymer solutions through pore constriction arrays
Authors:
Emily Y. Chen,
Sujit S. Datta
Abstract:
Diverse chemical, energy, environmental, and industrial processes involve the flow of polymer solutions in porous media. The accumulation and dissipation of elastic stresses as the polymers are transported through the tortuous, confined pore space can lead to the development of an elastic flow instability above a threshold flow rate. This flow instability can generate complex flows with strong spa…
▽ More
Diverse chemical, energy, environmental, and industrial processes involve the flow of polymer solutions in porous media. The accumulation and dissipation of elastic stresses as the polymers are transported through the tortuous, confined pore space can lead to the development of an elastic flow instability above a threshold flow rate. This flow instability can generate complex flows with strong spatiotemporal fluctuations, despite the low Reynolds number ($\mathrm{Re} \ll 1$); for example, in 1D ordered arrays of pore constrictions, this unstable flow can be multistable, with distinct pores exhibiting distinct unstable flow states. Here, we examine how this multistability is influenced by fluid rheology. Through experiments using diverse polymer solutions having systematic variations in fluid shear-thinning or elasticity, in pore constriction arrays of varying geometries, we show that the onset of multistability can be described using a single dimensionless parameter. This parameter, the streamwise Deborah number, compares the stress relaxation time of the polymer solution to the time required for the fluid to be advected between pore constrictions. Our work thus helps to deepen understanding of the influence of fluid rheology on elastic instabilities, hel** to establish guidelines for the rational design of polymeric fluids with desirable flow behaviors.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
Learning System Dynamics without Forgetting
Authors:
Xikun Zhang,
Dong** Song,
Yushan Jiang,
Yixin Chen,
Dacheng Tao
Abstract:
Predicting the trajectories of systems with unknown dynamics (\textit{i.e.} the governing rules) is crucial in various research fields, including physics and biology. This challenge has gathered significant attention from diverse communities. Most existing works focus on learning fixed system dynamics within one single system. However, real-world applications often involve multiple systems with di…
▽ More
Predicting the trajectories of systems with unknown dynamics (\textit{i.e.} the governing rules) is crucial in various research fields, including physics and biology. This challenge has gathered significant attention from diverse communities. Most existing works focus on learning fixed system dynamics within one single system. However, real-world applications often involve multiple systems with different types of dynamics or evolving systems with non-stationary dynamics (dynamics shifts). When data from those systems are continuously collected and sequentially fed to machine learning models for training, these models tend to be biased toward the most recently learned dynamics, leading to catastrophic forgetting of previously observed/learned system dynamics. To this end, we aim to learn system dynamics via continual learning. Specifically, we present a novel framework of Mode-switching Graph ODE (MS-GODE), which can continually learn varying dynamics and encode the system-specific dynamics into binary masks over the model parameters. During the inference stage, the model can select the most confident mask based on the observational data to identify the system and predict future trajectories accordingly. Empirically, we systematically investigate the task configurations and compare the proposed MS-GODE with state-of-the-art techniques. More importantly, we construct a novel benchmark of biological dynamic systems, featuring diverse systems with disparate dynamics and significantly enriching the research field of machine learning for dynamic systems.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
HRDE: Retrieval-Augmented Large Language Models for Chinese Health Rumor Detection and Explainability
Authors:
Yanfang Chen,
Ding Chen,
Shichao Song,
Simin Niu,
Hanyu Wang,
Zeyun Tang,
Feiyu Xiong,
Zhiyu Li
Abstract:
As people increasingly prioritize their health, the speed and breadth of health information dissemination on the internet have also grown. At the same time, the presence of false health information (health rumors) intermingled with genuine content poses a significant potential threat to public health. However, current research on Chinese health rumors still lacks a large-scale, public, and open-so…
▽ More
As people increasingly prioritize their health, the speed and breadth of health information dissemination on the internet have also grown. At the same time, the presence of false health information (health rumors) intermingled with genuine content poses a significant potential threat to public health. However, current research on Chinese health rumors still lacks a large-scale, public, and open-source dataset of health rumor information, as well as effective and reliable rumor detection methods. This paper addresses this gap by constructing a dataset containing 1.12 million health-related rumors (HealthRCN) through web scra** of common health-related questions and a series of data processing steps. HealthRCN is the largest known dataset of Chinese health information rumors to date. Based on this dataset, we propose retrieval-augmented large language models for Chinese health rumor detection and explainability (HRDE). This model leverages retrieved relevant information to accurately determine whether the input health information is a rumor and provides explanatory responses, effectively aiding users in verifying the authenticity of health information. In evaluation experiments, we compared multiple models and found that HRDE outperformed them all, including GPT-4-1106-Preview, in rumor detection accuracy and answer quality. HRDE achieved an average accuracy of 91.04% and an F1 score of 91.58%.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
Hybrid Quantum-Classical Clustering for Preparing a Prior Distribution of Eigenspectrum
Authors:
Mengzhen Ren,
Yu-Cheng Chen,
Ching-Jui Lai,
Min-Hsiu Hsieh,
Alice Hu
Abstract:
Determining the energy gap in a quantum many-body system is critical to understanding its behavior and is important in quantum chemistry and condensed matter physics. The challenge of determining the energy gap requires identifying both the excited and ground states of a system. In this work, we consider preparing the prior distribution and circuits for the eigenspectrum of time-independent Hamilt…
▽ More
Determining the energy gap in a quantum many-body system is critical to understanding its behavior and is important in quantum chemistry and condensed matter physics. The challenge of determining the energy gap requires identifying both the excited and ground states of a system. In this work, we consider preparing the prior distribution and circuits for the eigenspectrum of time-independent Hamiltonians, which can benefit both classical and quantum algorithms for solving eigenvalue problems. The proposed algorithm unfolds in three strategic steps: Hamiltonian transformation, parameter representation, and classical clustering. These steps are underpinned by two key insights: the use of quantum circuits to approximate the ground state of transformed Hamiltonians and the analysis of parameter representation to distinguish between eigenvectors. The algorithm is showcased through applications to the 1D Heisenberg system and the LiH molecular system, highlighting its potential for both near-term quantum devices and fault-tolerant quantum devices. The paper also explores the scalability of the method and its performance across various settings, setting the stage for more resource-efficient quantum computations that are both accurate and fast. The findings presented here mark a new insight into hybrid algorithms, offering a pathway to overcoming current computational challenges.
△ Less
Submitted 29 June, 2024;
originally announced July 2024.
-
Observation of the Electromagnetic Dalitz Transition $h_c \rightarrow e^+e^-η_c$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
S. Ahmed,
M. Albrecht,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
X. H. Bai,
Y. Bai,
O. Bakina,
R. Baldini Ferroli,
I. Balossino,
Y. Ban,
K. Begzsuren,
N. Berger,
M. Bertani,
D. Bettoni,
F. Bianchi,
J. Bloms,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (495 additional authors not shown)
Abstract:
Using $(27.12\pm 0.14)\times10^8$ $ψ(3686)$ decays and data samples of $e^+e^-$ collisions with $\sqrt{s}$ from 4.130 to 4.780~GeV collected with the BESIII detector, we report the first observation of the electromagnetic Dalitz transition $h_c\to e^+e^-η_c$ with a statistical significance of $5.4σ$. We measure the ratio of the branching fractions…
▽ More
Using $(27.12\pm 0.14)\times10^8$ $ψ(3686)$ decays and data samples of $e^+e^-$ collisions with $\sqrt{s}$ from 4.130 to 4.780~GeV collected with the BESIII detector, we report the first observation of the electromagnetic Dalitz transition $h_c\to e^+e^-η_c$ with a statistical significance of $5.4σ$. We measure the ratio of the branching fractions $\frac{\mathcal{B}(h_c\rightarrow e^+e^-η_c)}{\mathcal{B}(h_c\rightarrow γη_c)}$ separately for the $h_c$ samples produced via $ψ(3686)\toπ^0h_c$ and $e^+e^-\toπ^+π^-h_c$. The average ratio is determined to be $(0.59\pm0.10(\text{stat.})\pm0.04(\text{syst.}))\%$, where the uncertainty includes both statistical and systematic components.
△ Less
Submitted 28 June, 2024;
originally announced July 2024.
-
Pseudo grid-based physics-informed convolutional-recurrent network solving the integrable nonlinear lattice equations
Authors:
Zhe Lin,
Yong Chen
Abstract:
Traditional discrete learning methods involve discretizing continuous equations using difference schemes, necessitating considerations of stability and convergence. Integrable nonlinear lattice equations possess a profound mathematical structure that enables them to revert to continuous integrable equations in the continuous limit, particularly retaining integrable properties such as conservation…
▽ More
Traditional discrete learning methods involve discretizing continuous equations using difference schemes, necessitating considerations of stability and convergence. Integrable nonlinear lattice equations possess a profound mathematical structure that enables them to revert to continuous integrable equations in the continuous limit, particularly retaining integrable properties such as conservation laws, Hamiltonian structure, and multiple soliton solutions. The pseudo grid-based physics-informed convolutional-recurrent network (PG-PhyCRNet) is proposed to investigate the localized wave solutions of integrable lattice equations, which significantly enhances the model's extrapolation capability to lattice points beyond the temporal domain. We conduct a comparative analysis of PG-PhyCRNet with and without pseudo grid by investigating the multi-soliton solutions and rational solitons of the Toda lattice and self-dual network equation. The results indicate that the PG-PhyCRNet excels in capturing long-term evolution and enhances the model's extrapolation capability for solitons, particularly those with steep waveforms and high wave speeds. Finally, the robustness of the PG-PhyCRNet method and its effect on the prediction of solutions in different scenarios are confirmed through repeated experiments involving pseudo grid partitioning.
△ Less
Submitted 25 June, 2024;
originally announced July 2024.
-
Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language
Authors:
Yicheng Chen,
Xiangtai Li,
Yining Li,
Yanhong Zeng,
Jianzong Wu,
Xiangyu Zhao,
Kai Chen
Abstract:
Diffusion-based models have shown great potential in generating high-quality images with various layouts, which can benefit downstream perception tasks. However, a fully automatic layout generation driven only by language and a suitable metric for measuring multiple generated instances has not been well explored. In this work, we present Auto Cherry-Picker (ACP), a novel framework that generates h…
▽ More
Diffusion-based models have shown great potential in generating high-quality images with various layouts, which can benefit downstream perception tasks. However, a fully automatic layout generation driven only by language and a suitable metric for measuring multiple generated instances has not been well explored. In this work, we present Auto Cherry-Picker (ACP), a novel framework that generates high-quality multi-modal training examples to augment perception and multi-modal training. Starting with a simple list of natural language concepts, we prompt large language models (LLMs) to generate a detailed description and design reasonable layouts. Next, we use an off-the-shelf text-to-image model to generate multiple images. Then, the generated data are refined using a comprehensively designed metric to ensure quality. In particular, we present a new metric, Composite Layout and Image Score (CLIS), to evaluate the generated images fairly. Our synthetic high-quality examples boost performance in various scenarios by customizing the initial concept list, especially in addressing challenges associated with long-tailed distribution and imbalanced datasets. Experiment results on downstream tasks demonstrate that Auto Cherry-Picker can significantly improve the performance of existing models. In addition, we have thoroughly investigated the correlation between CLIS and performance gains in downstream tasks, and we find that a better CLIS score results in better performance. This finding shows the potential for evaluation metrics as the role for various visual perception and MLLM tasks. Code will be available.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
HouseCrafter: Lifting Floorplans to 3D Scenes with 2D Diffusion Model
Authors:
Hieu T. Nguyen,
Yiwen Chen,
Vikram Voleti,
Varun Jampani,
Huaizu Jiang
Abstract:
We introduce HouseCrafter, a novel approach that can lift a floorplan into a complete large 3D indoor scene (e.g., a house). Our key insight is to adapt a 2D diffusion model, which is trained on web-scale images, to generate consistent multi-view color (RGB) and depth (D) images across different locations of the scene. Specifically, the RGB-D images are generated autoregressively in a batch-wise m…
▽ More
We introduce HouseCrafter, a novel approach that can lift a floorplan into a complete large 3D indoor scene (e.g., a house). Our key insight is to adapt a 2D diffusion model, which is trained on web-scale images, to generate consistent multi-view color (RGB) and depth (D) images across different locations of the scene. Specifically, the RGB-D images are generated autoregressively in a batch-wise manner along sampled locations based on the floorplan, where previously generated images are used as condition to the diffusion model to produce images at nearby locations. The global floorplan and attention design in the diffusion model ensures the consistency of the generated images, from which a 3D scene can be reconstructed. Through extensive evaluation on the 3D-Front dataset, we demonstrate that HouseCraft can generate high-quality house-scale 3D scenes. Ablation studies also validate the effectiveness of different design choices. We will release our code and model weights. Project page: https://neu-vi.github.io/houseCrafter/
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
eMoE-Tracker: Environmental MoE-based Transformer for Robust Event-guided Object Tracking
Authors:
Yucheng Chen,
Lin Wang
Abstract:
The unique complementarity of frame-based and event cameras for high frame rate object tracking has recently inspired some research attempts to develop multi-modal fusion approaches. However, these methods directly fuse both modalities and thus ignore the environmental attributes, e.g., motion blur, illumination variance, occlusion, scale variation, etc. Meanwhile, no interaction between search an…
▽ More
The unique complementarity of frame-based and event cameras for high frame rate object tracking has recently inspired some research attempts to develop multi-modal fusion approaches. However, these methods directly fuse both modalities and thus ignore the environmental attributes, e.g., motion blur, illumination variance, occlusion, scale variation, etc. Meanwhile, no interaction between search and template features makes distinguishing target objects and backgrounds difficult. As a result, performance degradation is induced especially in challenging conditions. This paper proposes a novel and effective Transformer-based event-guided tracking framework, called eMoE-Tracker, which achieves new SOTA performance under various conditions. Our key idea is to disentangle the environment into several learnable attributes to dynamically learn the attribute-specific features for better interaction and discriminability between the target information and background. To achieve the goal, we first propose an environmental Mix-of-Experts (eMoE) module that is built upon the environmental Attributes Disentanglement to learn attribute-specific features and environmental Attributes Gating to assemble the attribute-specific features by the learnable attribute scores dynamically. The eMoE module is a subtle router that fine-tunes the transformer backbone more efficiently. We then introduce a contrastive relation modeling (CRM) module to improve interaction and discriminability between the target information and background. Extensive experiments on diverse event-based benchmark datasets showcase the superior performance of our eMoE-Tracker compared to the prior arts.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
The synthetic gauge field and exotic vortex phase with spin-orbital-angular-momentum coupling
Authors:
Yingqi Liu,
Yun Chen,
Yuangang Deng
Abstract:
Ultracold atoms endowed with tunable spin-orbital-angular-momentum coupling (SOAMC) represent a promising avenue for delving into exotic quantum phenomena. Building on recent experimental advancements, we propose the generation of synthetic gauge fields ,and by including exotic vortex phases within spinor Bose-Einstein condensates, employing a combination of a running wave and Laguerre-Gaussian la…
▽ More
Ultracold atoms endowed with tunable spin-orbital-angular-momentum coupling (SOAMC) represent a promising avenue for delving into exotic quantum phenomena. Building on recent experimental advancements, we propose the generation of synthetic gauge fields ,and by including exotic vortex phases within spinor Bose-Einstein condensates, employing a combination of a running wave and Laguerre-Gaussian laser fields. We investigate the ground-state characteristics of the SOAMC condensate, revealing the emergence of exotic vortex states with controllable orbital angular momenta. It is shown that the interplay of the SOAMC and conventional spin-linear-momentum coupling induced by the running wave beam leads to the formation of a vortex state exhibiting a phase stripe hosting single multiply quantized singularity. The phase of the ground state will undergo the phase transition corresponding to the breaking of rotational symmetry while preserving the mirror symmetry. Importantly, the observed density distribution of the ground-state wavefunction, exhibiting broken rotational symmetry, can be well characterized by the synthetic magnetic field generated through light interaction with the dressed spin state. Our findings pave the way for further exploration into the rotational properties of stable exotic vortices with higher orbital angular momenta against splitting in the presence of synthetic gauge fields in ultracold quantum gases.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
YuLan: An Open-source Large Language Model
Authors:
Yutao Zhu,
Kun Zhou,
Kelong Mao,
Wentong Chen,
Yiding Sun,
Zhipeng Chen,
Qian Cao,
Yihan Wu,
Yushuo Chen,
Feng Wang,
Lei Zhang,
Junyi Li,
Xiaolei Wang,
Lei Wang,
Beichen Zhang,
Zican Dong,
Xiaoxue Cheng,
Yuhan Chen,
Xinyu Tang,
Yupeng Hou,
Qiangqiang Ren,
Xincheng Pang,
Shufang Xie,
Wayne Xin Zhao,
Zhicheng Dou
, et al. (13 additional authors not shown)
Abstract:
Large language models (LLMs) have become the foundation of many applications, leveraging their extensive capabilities in processing and understanding natural language. While many open-source LLMs have been released with technical reports, the lack of training details hinders further research and development. This paper presents the development of YuLan, a series of open-source LLMs with $12$ billi…
▽ More
Large language models (LLMs) have become the foundation of many applications, leveraging their extensive capabilities in processing and understanding natural language. While many open-source LLMs have been released with technical reports, the lack of training details hinders further research and development. This paper presents the development of YuLan, a series of open-source LLMs with $12$ billion parameters. The base model of YuLan is pre-trained on approximately $1.7$T tokens derived from a diverse corpus, including massive English, Chinese, and multilingual texts. We design a three-stage pre-training method to enhance YuLan's overall capabilities. Subsequent phases of training incorporate instruction-tuning and human alignment, employing a substantial volume of high-quality synthesized data. To facilitate the learning of complex and long-tail knowledge, we devise a curriculum-learning framework throughout across these stages, which helps LLMs learn knowledge in an easy-to-hard manner. YuLan's training is finished on Jan, 2024 and has achieved performance on par with state-of-the-art LLMs across various English and Chinese benchmarks. This paper outlines a comprehensive technical roadmap for develo** LLMs from scratch. Our model and codes are available at https://github.com/RUC-GSAI/YuLan-Chat.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
Self-Supervised Spatial-Temporal Normality Learning for Time Series Anomaly Detection
Authors:
Yutong Chen,
Hongzuo Xu,
Guansong Pang,
Hezhe Qiao,
Yuan Zhou,
Mingsheng Shang
Abstract:
Time Series Anomaly Detection (TSAD) finds widespread applications across various domains such as financial markets, industrial production, and healthcare. Its primary objective is to learn the normal patterns of time series data, thereby identifying deviations in test samples. Most existing TSAD methods focus on modeling data from the temporal dimension, while ignoring the semantic information in…
▽ More
Time Series Anomaly Detection (TSAD) finds widespread applications across various domains such as financial markets, industrial production, and healthcare. Its primary objective is to learn the normal patterns of time series data, thereby identifying deviations in test samples. Most existing TSAD methods focus on modeling data from the temporal dimension, while ignoring the semantic information in the spatial dimension. To address this issue, we introduce a novel approach, called Spatial-Temporal Normality learning (STEN). STEN is composed of a sequence Order prediction-based Temporal Normality learning (OTN) module that captures the temporal correlations within sequences, and a Distance prediction-based Spatial Normality learning (DSN) module that learns the relative spatial relations between sequences in a feature space. By synthesizing these two modules, STEN learns expressive spatial-temporal representations for the normal patterns hidden in the time series data. Extensive experiments on five popular TSAD benchmarks show that STEN substantially outperforms state-of-the-art competing methods. Our code is available at https://github.com/mala-lab/STEN.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
Mixture of In-Context Experts Enhance LLMs' Long Context Awareness
Authors:
Hongzhan Lin,
Ang Lv,
Yuhan Chen,
Chen Zhu,
Yang Song,
Hengshu Zhu,
Rui Yan
Abstract:
Many studies have revealed that large language models (LLMs) exhibit uneven awareness of different contextual positions.Their limited context awareness can lead to overlooking critical information and subsequent task failures. While several approaches have been proposed to enhance LLMs' context awareness, achieving both effectiveness and efficiency remains challenging.In this paper, for LLMs utili…
▽ More
Many studies have revealed that large language models (LLMs) exhibit uneven awareness of different contextual positions.Their limited context awareness can lead to overlooking critical information and subsequent task failures. While several approaches have been proposed to enhance LLMs' context awareness, achieving both effectiveness and efficiency remains challenging.In this paper, for LLMs utilizing RoPE as position embeddings, we introduce a novel method called ``Mixture of In-Context Experts'' (MoICE) to address this challenge. MoICE comprises two key components: a router integrated into each attention head within LLMs and a lightweight router-only training optimization strategy: (1) MoICE views each RoPE angle as an `in-context' expert, demonstrated to be capable of directing the attention of a head to specific contextual positions. Consequently, each attention head flexibly processes tokens using multiple RoPE angles dynamically selected by the router to attend to the needed positions. This approach mitigates the risk of overlooking essential contextual information. (2) The router-only training strategy entails freezing LLM parameters and exclusively updating routers for only a few steps. When applied to open-source LLMs including Llama and Mistral, MoICE surpasses prior methods across multiple tasks on long context understanding and generation, all while maintaining commendable inference efficiency.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
ReXTime: A Benchmark Suite for Reasoning-Across-Time in Videos
Authors:
Jr-Jen Chen,
Yu-Chien Liao,
Hsi-Che Lin,
Yu-Chu Yu,
Yen-Chun Chen,
Yu-Chiang Frank Wang
Abstract:
We introduce ReXTime, a benchmark designed to rigorously test AI models' ability to perform temporal reasoning within video events. Specifically, ReXTime focuses on reasoning across time, i.e. human-like understanding when the question and its corresponding answer occur in different video segments. This form of reasoning, requiring advanced understanding of cause-and-effect relationships across vi…
▽ More
We introduce ReXTime, a benchmark designed to rigorously test AI models' ability to perform temporal reasoning within video events. Specifically, ReXTime focuses on reasoning across time, i.e. human-like understanding when the question and its corresponding answer occur in different video segments. This form of reasoning, requiring advanced understanding of cause-and-effect relationships across video segments, poses significant challenges to even the frontier multimodal large language models. To facilitate this evaluation, we develop an automated pipeline for generating temporal reasoning question-answer pairs, significantly reducing the need for labor-intensive manual annotations. Our benchmark includes 921 carefully vetted validation samples and 2,143 test samples, each manually curated for accuracy and relevance. Evaluation results show that while frontier large language models outperform academic models, they still lag behind human performance by a significant 14.3% accuracy gap. Additionally, our pipeline creates a training dataset of 9,695 machine generated samples without manual effort, which empirical studies suggest can enhance the across-time reasoning via fine-tuning.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
A Machine Learning Method for Monte Carlo Calculations of Radiative Processes
Authors:
William Charles,
Alexander Y. Chen
Abstract:
Radiative processes such as synchrotron radiation and Compton scattering play an important role in astrophysics. Radiative processes are fundamentally stochastic in nature, and the best tools currently used for resolving these processes computationally are Monte Carlo (MC) methods. These methods typically draw a large number of samples from a complex distribution such as the differential cross sec…
▽ More
Radiative processes such as synchrotron radiation and Compton scattering play an important role in astrophysics. Radiative processes are fundamentally stochastic in nature, and the best tools currently used for resolving these processes computationally are Monte Carlo (MC) methods. These methods typically draw a large number of samples from a complex distribution such as the differential cross section for electron-photon scattering, and then use these samples to compute the radiation properties such as angular distribution, spectrum, and polarization. In this work we propose a machine learning (ML) technique for efficient sampling from arbitrary known probability distributions that can be used to accelerate Monte Carlo calculation of radiative processes in astrophysical scenarios. In particular, we apply our technique to inverse Compton radiation and find that our ML method can be up to an order of magnitude faster than traditional methods currently in use.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
AutoRAG-HP: Automatic Online Hyper-Parameter Tuning for Retrieval-Augmented Generation
Authors:
Jia Fu,
Xiaoting Qin,
Fangkai Yang,
Lu Wang,
Jue Zhang,
Qingwei Lin,
Yubo Chen,
Dongmei Zhang,
Saravan Rajmohan,
Qi Zhang
Abstract:
Recent advancements in Large Language Models have transformed ML/AI development, necessitating a reevaluation of AutoML principles for the Retrieval-Augmented Generation (RAG) systems. To address the challenges of hyper-parameter optimization and online adaptation in RAG, we propose the AutoRAG-HP framework, which formulates the hyper-parameter tuning as an online multi-armed bandit (MAB) problem…
▽ More
Recent advancements in Large Language Models have transformed ML/AI development, necessitating a reevaluation of AutoML principles for the Retrieval-Augmented Generation (RAG) systems. To address the challenges of hyper-parameter optimization and online adaptation in RAG, we propose the AutoRAG-HP framework, which formulates the hyper-parameter tuning as an online multi-armed bandit (MAB) problem and introduces a novel two-level Hierarchical MAB (Hier-MAB) method for efficient exploration of large search spaces. We conduct extensive experiments on tuning hyper-parameters, such as top-k retrieved documents, prompt compression ratio, and embedding methods, using the ALCE-ASQA and Natural Questions datasets. Our evaluation from jointly optimization all three hyper-parameters demonstrate that MAB-based online learning methods can achieve Recall@5 $\approx 0.8$ for scenarios with prominent gradients in search space, using only $\sim20\%$ of the LLM API calls required by the Grid Search approach. Additionally, the proposed Hier-MAB approach outperforms other baselines in more challenging optimization scenarios. The code will be made available at https://aka.ms/autorag.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Coordinated RSMA for Integrated Sensing and Communication in Emergency UAV Systems
Authors:
Binghan Yao,
Ruoguang Li,
Yingyang Chen,
Li Wang
Abstract:
Recently, unmanned aerial vehicle (UAV)-enabled integrated sensing and communication (ISAC) is emerging as a promising technique for achieving robust and rapid emergency response capabilities. Such a novel framework offers high-quality and cost-efficient C\&S services due to the intrinsic flexibility and mobility of UAVs. In parallel, rate-splitting multiple access (RSMA) is able to achieve a tail…
▽ More
Recently, unmanned aerial vehicle (UAV)-enabled integrated sensing and communication (ISAC) is emerging as a promising technique for achieving robust and rapid emergency response capabilities. Such a novel framework offers high-quality and cost-efficient C\&S services due to the intrinsic flexibility and mobility of UAVs. In parallel, rate-splitting multiple access (RSMA) is able to achieve a tailor-made communication by splitting the messages into private and common parts with adjustable rates, making it suitable for on-demand data transmission in disaster scenarios. In this paper, we propose a coordinated RSMA for integrated sensing and communication (CoRSMA-ISAC) scheme in emergency UAV system to facilitate search and rescue operations, where a number of ISAC UAVs simultaneously communicate with multiple communication survivors (CSs) and detect a potentially trapped survivor (TS) in a coordinated manner. Towards this end, an optimization problem is formulated to maximize the weighted sum rate (WSR) of the system, subject to the sensing signal-to-noise ratio (SNR) requirement. In order to solve the formulated non-convex problem, we first decompose it into three subproblems, i.e., UAV-CS association, UAV deployment, as well as beamforming optimization and rate allocation. Subsequently, we introduce an iterative optimization approach leveraging K-Means, successive convex approximation (SCA), and semi-definite relaxation (SDR) algorithms to reframe the subproblems into a more tractable form and efficiently solve them. Simulation results demonstrate that the proposed CoRSMA-ISAC scheme is superior to conventional space division multiple access (SDMA), non-orthogonal multiple access (NOMA), and orthogonal multiple access (OMA) in terms of both communication and sensing performance.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Improved measurement of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (643 additional authors not shown)
Abstract:
Analyzing $e^+e^-$ collision data corresponding to an integrated luminosity of $7.33~\mathrm{fb}^{-1}$ collected at center-of-mass energies between 4.128 and 4.226~GeV with the BESIII detector, we measure the branching fraction of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$ to be $(2.98\pm0.23\pm0.12)\times10^{-3}$. The $D_s^+\to K^0$ hadronic form factor is determined from the differential dec…
▽ More
Analyzing $e^+e^-$ collision data corresponding to an integrated luminosity of $7.33~\mathrm{fb}^{-1}$ collected at center-of-mass energies between 4.128 and 4.226~GeV with the BESIII detector, we measure the branching fraction of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$ to be $(2.98\pm0.23\pm0.12)\times10^{-3}$. The $D_s^+\to K^0$ hadronic form factor is determined from the differential decay rate of $D^+_s\to K^0 e^+ν_e$ to be $f^{K^0}_+(0)=0.636\pm0.049\pm0.013$. For both measurements, the first uncertainty is statistical and the second systematic. The branching fraction and form factor measurements are factors of 1.6 and 1.7 more precise than the previous world averages, respectively.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Exploiting Structured Sparsity in Near Field: From the Perspective of Decomposition
Authors:
Xufeng Guo,
Yuanbin Chen,
Ying Wang,
Chau Yuen
Abstract:
The structured sparsity can be leveraged in traditional far-field channels, greatly facilitating efficient sparse channel recovery by compressing the complexity of overheads to the level of the scatterer number. However, when experiencing a fundamental shift from planar-wave-based far-field modeling to spherical-wave-based near-field modeling, whether these benefits persist in the near-field regim…
▽ More
The structured sparsity can be leveraged in traditional far-field channels, greatly facilitating efficient sparse channel recovery by compressing the complexity of overheads to the level of the scatterer number. However, when experiencing a fundamental shift from planar-wave-based far-field modeling to spherical-wave-based near-field modeling, whether these benefits persist in the near-field regime remains an open issue. To answer this question, this article delves into structured sparsity in the near-field realm, examining its peculiarities and challenges. In particular, we present the key features of near-field structured sparsity in contrast to the far-field counterpart, drawing from both physical and mathematical perspectives. Upon unmasking the theoretical bottlenecks, we resort to bypassing them by decoupling the geometric parameters of the scatterers, termed the triple parametric decomposition (TPD) framework. It is demonstrated that our novel TPD framework can achieve robust recovery of near-field sparse channels by applying the potential structured sparsity and avoiding the curse of complexity and overhead.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Formation Under Communication Constraints: Control Performance Meets Channel Capacity
Authors:
Yaru Chen,
Yirui Cong,
Xiangyun Zhou,
Long Cheng,
Xiangke Wang
Abstract:
In wireless communication-based formation control systems, the control performance is significantly impacted by the channel capacity of each communication link between agents. This relationship, however, remains under-investigated in the existing studies. To address this gap, the formation control problem of classical second-order multi-agent systems with bounded process noises was considered taki…
▽ More
In wireless communication-based formation control systems, the control performance is significantly impacted by the channel capacity of each communication link between agents. This relationship, however, remains under-investigated in the existing studies. To address this gap, the formation control problem of classical second-order multi-agent systems with bounded process noises was considered taking into account the channel capacity. More specifically, the model of communication links between agents is first established, based on a new concept -- guaranteed communication region, which characterizes all possible locations for successful message decoding in the present of control-system uncertainty. Furthermore, we rigorously prove that, the guaranteed communication region does not unboundedly increase with the transmission time, which indicates an important trade-off between the guaranteed communication region and the data rate. The fundamental limits of data rate for any desired accuracy are also obtained. Finally, the integrated design to achieve the desired formation accuracy is proposed, where an estimation-based controller and transmit power control strategy are developed.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Advancing Cross-domain Discriminability in Continual Learning of Vison-Language Models
Authors:
Yicheng Xu,
Yuxin Chen,
Jiahao Nie,
Yusong Wang,
Hui** Zhuang,
Manabu Okumura
Abstract:
Continual learning (CL) with Vision-Language Models (VLMs) has overcome the constraints of traditional CL, which only focuses on previously encountered classes. During the CL of VLMs, we need not only to prevent the catastrophic forgetting on incrementally learned knowledge but also to preserve the zero-shot ability of VLMs. However, existing methods require additional reference datasets to mainta…
▽ More
Continual learning (CL) with Vision-Language Models (VLMs) has overcome the constraints of traditional CL, which only focuses on previously encountered classes. During the CL of VLMs, we need not only to prevent the catastrophic forgetting on incrementally learned knowledge but also to preserve the zero-shot ability of VLMs. However, existing methods require additional reference datasets to maintain such zero-shot ability and rely on domain-identity hints to classify images across different domains. In this study, we propose Regression-based Analytic Incremental Learning (RAIL), which utilizes a recursive ridge regression-based adapter to learn from a sequence of domains in a non-forgetting manner and decouple the cross-domain correlations by projecting features to a higher-dimensional space. Cooperating with a training-free fusion module, RAIL absolutely preserves the VLM's zero-shot ability on unseen domains without any reference data. Additionally, we introduce Cross-domain Task-Agnostic Incremental Learning (X-TAIL) setting. In this setting, a CL learner is required to incrementally learn from multiple domains and classify test images from both seen and unseen domains without any domain-identity hint. We theoretically prove RAIL's absolute memorization on incrementally learned domains. Experiment results affirm RAIL's state-of-the-art performance in both X-TAIL and existing Multi-domain Task-Incremental Learning settings. The code will be released upon acceptance.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Decoding-Time Language Model Alignment with Multiple Objectives
Authors:
Ruizhe Shi,
Yifang Chen,
Yushi Hu,
Alisa Liu,
Hannaneh Hajishirzi,
Noah A. Smith,
Simon Du
Abstract:
Aligning language models (LMs) to human preferences has emerged as a critical pursuit, enabling these models to better serve diverse user needs. Existing methods primarily focus on optimizing LMs for a single reward function, limiting their adaptability to varied objectives. Here, we propose $\textbf{multi-objective decoding (MOD)}$, a decoding-time algorithm that outputs the next token from a lin…
▽ More
Aligning language models (LMs) to human preferences has emerged as a critical pursuit, enabling these models to better serve diverse user needs. Existing methods primarily focus on optimizing LMs for a single reward function, limiting their adaptability to varied objectives. Here, we propose $\textbf{multi-objective decoding (MOD)}$, a decoding-time algorithm that outputs the next token from a linear combination of predictions of all base models, for any given weightings over different objectives. We exploit a common form among a family of $f$-divergence regularized alignment approaches (such as PPO, DPO, and their variants) to identify a closed-form solution by Legendre transform, and derive an efficient decoding strategy. Theoretically, we show why existing approaches can be sub-optimal even in natural settings and obtain optimality guarantees for our method. Empirical results demonstrate the effectiveness of the algorithm. For example, compared to a parameter-merging baseline, MOD achieves 12.8% overall reward improvement when equally optimizing towards $3$ objectives. Moreover, we experiment with MOD on combining three fully-finetuned LLMs of different model sizes, each aimed at different objectives such as safety, coding, and general user preference. Unlike traditional methods that require careful curation of a mixture of datasets to achieve comprehensive improvement, we can quickly experiment with preference weightings using MOD to find the best combination of models. Our best combination reduces toxicity on Toxigen to nearly 0% and achieves 7.9--33.3% improvement across other three metrics ($\textit{i.e.}$, Codex@1, GSM-COT, BBH-COT).
△ Less
Submitted 28 June, 2024; v1 submitted 26 June, 2024;
originally announced June 2024.
-
Dense Monocular Motion Segmentation Using Optical Flow and Pseudo Depth Map: A Zero-Shot Approach
Authors:
Yuxiang Huang,
Yuhao Chen,
John Zelek
Abstract:
Motion segmentation from a single moving camera presents a significant challenge in the field of computer vision. This challenge is compounded by the unknown camera movements and the lack of depth information of the scene. While deep learning has shown impressive capabilities in addressing these issues, supervised models require extensive training on massive annotated datasets, and unsupervised mo…
▽ More
Motion segmentation from a single moving camera presents a significant challenge in the field of computer vision. This challenge is compounded by the unknown camera movements and the lack of depth information of the scene. While deep learning has shown impressive capabilities in addressing these issues, supervised models require extensive training on massive annotated datasets, and unsupervised models also require training on large volumes of unannotated data, presenting significant barriers for both. In contrast, traditional methods based on optical flow do not require training data, however, they often fail to capture object-level information, leading to over-segmentation or under-segmentation. In addition, they also struggle in complex scenes with substantial depth variations and non-rigid motion, due to the overreliance of optical flow. To overcome these challenges, we propose an innovative hybrid approach that leverages the advantages of both deep learning methods and traditional optical flow based methods to perform dense motion segmentation without requiring any training. Our method initiates by automatically generating object proposals for each frame using foundation models. These proposals are then clustered into distinct motion groups using both optical flow and relative depth maps as motion cues. The integration of depth maps derived from state-of-the-art monocular depth estimation models significantly enhances the motion cues provided by optical flow, particularly in handling motion parallax issues. Our method is evaluated on the DAVIS-Moving and YTVOS-Moving datasets, and the results demonstrate that our method outperforms the best unsupervised method and closely matches with the state-of-theart supervised methods.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs
Authors:
Xin Lai,
Zhuotao Tian,
Yukang Chen,
Senqiao Yang,
Xiangru Peng,
Jiaya Jia
Abstract:
Mathematical reasoning presents a significant challenge for Large Language Models (LLMs) due to the extensive and precise chain of reasoning required for accuracy. Ensuring the correctness of each reasoning step is critical. To address this, we aim to enhance the robustness and factuality of LLMs by learning from human feedback. However, Direct Preference Optimization (DPO) has shown limited benef…
▽ More
Mathematical reasoning presents a significant challenge for Large Language Models (LLMs) due to the extensive and precise chain of reasoning required for accuracy. Ensuring the correctness of each reasoning step is critical. To address this, we aim to enhance the robustness and factuality of LLMs by learning from human feedback. However, Direct Preference Optimization (DPO) has shown limited benefits for long-chain mathematical reasoning, as models employing DPO struggle to identify detailed errors in incorrect answers. This limitation stems from a lack of fine-grained process supervision. We propose a simple, effective, and data-efficient method called Step-DPO, which treats individual reasoning steps as units for preference optimization rather than evaluating answers holistically. Additionally, we have developed a data construction pipeline for Step-DPO, enabling the creation of a high-quality dataset containing 10K step-wise preference pairs. We also observe that in DPO, self-generated data is more effective than data generated by humans or GPT-4, due to the latter's out-of-distribution nature. Our findings demonstrate that as few as 10K preference data pairs and fewer than 500 Step-DPO training steps can yield a nearly 3% gain in accuracy on MATH for models with over 70B parameters. Notably, Step-DPO, when applied to Qwen2-72B-Instruct, achieves scores of 70.8% and 94.0% on the test sets of MATH and GSM8K, respectively, surpassing a series of closed-source models, including GPT-4-1106, Claude-3-Opus, and Gemini-1.5-Pro. Our code, data, and models are available at https://github.com/dvlab-research/Step-DPO.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Composition Vision-Language Understanding via Segment and Depth Anything Model
Authors:
Mingxiao Huo,
Pengliang Ji,
Haotian Lin,
Junchen Liu,
Yixiao Wang,
Yijun Chen
Abstract:
We introduce a pioneering unified library that leverages depth anything, segment anything models to augment neural comprehension in language-vision model zero-shot understanding. This library synergizes the capabilities of the Depth Anything Model (DAM), Segment Anything Model (SAM), and GPT-4V, enhancing multimodal tasks such as vision-question-answering (VQA) and composition reasoning. Through t…
▽ More
We introduce a pioneering unified library that leverages depth anything, segment anything models to augment neural comprehension in language-vision model zero-shot understanding. This library synergizes the capabilities of the Depth Anything Model (DAM), Segment Anything Model (SAM), and GPT-4V, enhancing multimodal tasks such as vision-question-answering (VQA) and composition reasoning. Through the fusion of segmentation and depth analysis at the symbolic instance level, our library provides nuanced inputs for language models, significantly advancing image interpretation. Validated across a spectrum of in-the-wild real-world images, our findings showcase progress in vision-language models through neural-symbolic integration. This novel approach melds visual and language analysis in an unprecedented manner. Overall, our library opens new directions for future research aimed at decoding the complexities of the real world through advanced multimodal technologies and our code is available at \url{https://github.com/AnthonyHuo/SAM-DAM-for-Compositional-Reasoning}.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
Waveform Learning under Phase Noise Impairment for Sub-THz Communications
Authors:
Dileepa Marasinghe,
Le Hang Nguyen,
Jafar Mohammadi,
Yejian Chen,
Thorsten Wild,
Nandana Rajatheva
Abstract:
The large untapped spectrum in sub-THz allows for ultra-high throughput communication to realize many seemingly impossible applications in 6G. Phase noise (PN) is one key hardware impairment, which is accentuated as we increase the frequency and bandwidth. Furthermore, the modest output power of the power amplifier demands limits on peak to average power ratio (PAPR) signal design. In this work, w…
▽ More
The large untapped spectrum in sub-THz allows for ultra-high throughput communication to realize many seemingly impossible applications in 6G. Phase noise (PN) is one key hardware impairment, which is accentuated as we increase the frequency and bandwidth. Furthermore, the modest output power of the power amplifier demands limits on peak to average power ratio (PAPR) signal design. In this work, we design a PN-robust, low PAPR single-carrier (SC) waveform by geometrically sha** the constellation and adapting the pulse sha** filter pair under practical PN modeling and adjacent channel leakage ratio (ACLR) constraints for a given excess bandwidth. We optimize the waveforms under conventional and state-of-the-art PN-aware demappers. Moreover, we introduce a neural-network (NN) demapper enhancing transceiver adaptability. We formulate the waveform optimization problem in its augmented Lagrangian form and use a back-propagation-inspired technique to obtain a design that is numerically robust to PN, while adhering to PAPR and ACLR constraints. The results substantiate the efficacy of the method, yielding up to 2.5 dB in the required Eb/N0 under stronger PN along with a PAPR reduction of 0.5 dB. Moreover, PAPR reductions up to 1.2 dB are possible with competitive BLER and SE performance in both low and high PN conditions.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Research on Information Extraction of LCSTS Dataset Based on an Improved BERTSum-LSTM Model
Authors:
Yiming Chen,
Haobin Chen,
Simin Liu,
Yunyun Liu,
Fanhao Zhou,
Bing Wei
Abstract:
With the continuous advancement of artificial intelligence, natural language processing technology has become widely utilized in various fields. At the same time, there are many challenges in creating Chinese news summaries. First of all, the semantics of Chinese news is complex, and the amount of information is enormous. Extracting critical information from Chinese news presents a significant cha…
▽ More
With the continuous advancement of artificial intelligence, natural language processing technology has become widely utilized in various fields. At the same time, there are many challenges in creating Chinese news summaries. First of all, the semantics of Chinese news is complex, and the amount of information is enormous. Extracting critical information from Chinese news presents a significant challenge. Second, the news summary should be concise and clear, focusing on the main content and avoiding redundancy. In addition, the particularity of the Chinese language, such as polysemy, word segmentation, etc., makes it challenging to generate Chinese news summaries. Based on the above, this paper studies the information extraction method of the LCSTS dataset based on an improved BERTSum-LSTM model. We improve the BERTSum-LSTM model to make it perform better in generating Chinese news summaries. The experimental results show that the proposed method has a good effect on creating news summaries, which is of great importance to the construction of news summaries.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Non-Markovian Quantum Exceptional Points
Authors:
Jhen-Dong Lin,
Po-Chen Kuo,
Neill Lambert,
Adam Miranowicz,
Franco Nori,
Yueh-Nan Chen
Abstract:
Exceptional points (EPs) are singularities in the spectra of non-Hermitian operators, where eigenvalues and eigenvectors coalesce. Recently, open quantum systems have been increasingly explored as EP testbeds due to their natural non-Hermitian nature. However, existing works mostly focus on the Markovian limit, leaving a gap in understanding EPs in the non-Markovian regime. In this work, we addres…
▽ More
Exceptional points (EPs) are singularities in the spectra of non-Hermitian operators, where eigenvalues and eigenvectors coalesce. Recently, open quantum systems have been increasingly explored as EP testbeds due to their natural non-Hermitian nature. However, existing works mostly focus on the Markovian limit, leaving a gap in understanding EPs in the non-Markovian regime. In this work, we address this gap by proposing a theoretical framework based on two numerically exact descriptions of non-Markovian dynamics: the pseudomode map** and the hierarchical equations of motion. The proposed framework enables conventional spectral analysis for EP identification, establishing direct links between EPs and dynamic manifestations in open systems, such as non-exponential decays and enhanced sensitivity to external perturbations. We unveil pure non-Markovian EPs that are unobservable in the Markovian limit. Remarkably, the EP aligns with the Markovian-to-non-Markovian transition, and the EP condition is adjustable by modifying environmental spectral properties. Moreover, we show that structured environments can elevate EP order, thereby enhancing the system's sensitivity. These findings lay a theoretical foundation and open new avenues for non-Markovian reservoir engineering and non-Hermitian physics.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Cascaded multi-phonon stimulated Raman scattering near second-harmonic-generation in thin-film lithium niobate microdisk
Authors:
Yuxuan He,
Xiongshuo Yan,
Jiangwei Wu,
Xiangmin Liu,
Yu** Chen,
Xianfeng Chen
Abstract:
High-quality microresonators can greatly enhance light-matter interactions and are excellent platforms for studying nonlinear optics. Wavelength conversion through nonlinear processes is the key to many applications of integrated optics. The stimulated Raman scattering process can extend the emission wavelength of a laser source to a wider range. Lithium niobate, as a Raman active crystalline mate…
▽ More
High-quality microresonators can greatly enhance light-matter interactions and are excellent platforms for studying nonlinear optics. Wavelength conversion through nonlinear processes is the key to many applications of integrated optics. The stimulated Raman scattering process can extend the emission wavelength of a laser source to a wider range. Lithium niobate, as a Raman active crystalline material, has remarkable potential for wavelength conversion. Here, we demonstrate the generation of cascaded multi-phonon Raman signals near the second-harmonic-generation peak in X-cut thin-film lithium niobate microdisk. Fine tuning of the specific cascaded Raman spectral lines has also been made by changing the pump wavelength. Raman lines can reach wavelength up to about 80 nm away from the SHG signal. We realize the SFG process associated with Raman signals in the visible range as well. Our work extends the use of WGM microresonators as effective optical upconversion wavelength converters in nonlinear optical applications.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Measurement of the cross sections of $e^+e^-\to K^{-}\barΞ^{+}Λ/Σ^{0}$ at center-of-mass energies between 3.510 and 4.914 GeV
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (638 additional authors not shown)
Abstract:
Using $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at center-of-mass energies between 3.510 and 4.914GeV, corresponding to an integrated luminosity of 25 fb$^{-1}$, we measure the Born cross sections for the process $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$ at thirty-five energy points with a partial-reconstruction strategy. By fitting the dressed cross sections of…
▽ More
Using $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at center-of-mass energies between 3.510 and 4.914GeV, corresponding to an integrated luminosity of 25 fb$^{-1}$, we measure the Born cross sections for the process $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$ at thirty-five energy points with a partial-reconstruction strategy. By fitting the dressed cross sections of $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$, evidence for $ψ(4160) \to K^{-}\barΞ^{+}Λ$ is found for the first time with a significance of 4.4$σ$, including systematic uncertainties. No evidence for other possible resonances is found. In addition, the products of electronic partial width and branching fraction for all assumed resonances decaying into $K^{-}\barΞ^{+}Λ/Σ^{0}$ are determined.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Protocols for Obtaining Reliable PDFs from Laboratory x-ray Sources Using PDFgetX3
Authors:
Till Schertenleib,
Daniel Schmuckler,
Yucong Chen,
Geng Bang **,
Wendy L. Queen,
Simon J. L. Billinge
Abstract:
In this work, we explored data acquisition protocols and improved data reduction protocols using PDFgetX3 to obtain reliable data for atomic pair distribution function (PDF) analysis from a laboratory-based Mo x-ray source. A variable counting scheme is described that preferentially counts in the high-angle region of the diffraction pattern. The effects on the resulting PDF are studied by varying…
▽ More
In this work, we explored data acquisition protocols and improved data reduction protocols using PDFgetX3 to obtain reliable data for atomic pair distribution function (PDF) analysis from a laboratory-based Mo x-ray source. A variable counting scheme is described that preferentially counts in the high-angle region of the diffraction pattern. The effects on the resulting PDF are studied by varying the overall count time, the use of Soller slits, and limiting the out-of-plane divergence of the incident beam. The protocols are tested using an amorphous silica and a quartz sample. We also present a modification to the current PDFgetX3 data corrections to take care of sample absorption, which was previously neglected in the use of that program for high-energy synchrotron x-ray data. We show that, despite limitations in the Q-range and flux of laboratory instruments, reasonable data for PDF model fits may be obtained using the best protocols in a few hours of counting.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Generation of spatiotemporal acoustic vortices with arbitrarily oriented orbital angular momentum
Authors:
Shuai Liu,
Hao Ge,
Xiang-Yuan Xu,
Yuan Sun,
Xiao-** Liu,
Ming-Hui Lu,
Yan-Feng Chen
Abstract:
Despite extensive exploration of acoustic vortices carrying orbital angular momentum (OAM), the generation of acoustic vortices with OAM orientations beyond the conventional longitudinal direction remains largely unexplored. Spatiotemporal (ST) vortices, featuring spiral phase twisting in the ST domain and carrying transverse OAM, have recently attracted considerable interest in optics and acousti…
▽ More
Despite extensive exploration of acoustic vortices carrying orbital angular momentum (OAM), the generation of acoustic vortices with OAM orientations beyond the conventional longitudinal direction remains largely unexplored. Spatiotemporal (ST) vortices, featuring spiral phase twisting in the ST domain and carrying transverse OAM, have recently attracted considerable interest in optics and acoustics. Here, we report the generation of three-dimensional (3D) ST acoustic vortices with arbitrarily oriented OAM, thereby opening up a new dimension in acoustic OAM control. By utilizing a two-dimensional (2D) acoustic phased array, we introduce two approaches to manipulate the orientation of OAM: through the direct rotation of vortices in 3D space and the intersection of vortices carrying distinct types of OAM. These methods enable unprecedented control over the orientation of acoustic OAM, providing a new degree of freedom in the manipulation of acoustic waves. The arbitrarily oriented OAM holds promise for enhancing acoustic communication by broadening capacity and enabling more complex particle manipulation techniques. Our work establishes a foundation for future explorations into the complex dynamics of novel structured acoustic fields in the ST domain.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Measurements of $K_S^0$-$K_L^0$ asymmetries in the decays $Λ_c^+ \to pK_{L,S}^0$, $pK_{L,S}^0π^+π^-$ and $pK_{L,S}^0π^0$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (643 additional authors not shown)
Abstract:
Using $e^+e^-$ annihilation data sets corresponding to an integrated luminosity of 4.5 $\text{fb}^{-1}$, collected with the BESIII detector at center-of-mass energies between 4.600 and 4.699 GeV, we report the first measurements of the absolute branching fractions $\mathcal{B}(Λ_c^+\to pK_{L}^{0})=(1.67 \pm 0.06 \pm 0. 04)\%$, $\mathcal{B}(Λ_c^+\to pK_{L}^{0}π^+π^-)=(1.69 \pm 0.10 \pm 0.05)\%$, an…
▽ More
Using $e^+e^-$ annihilation data sets corresponding to an integrated luminosity of 4.5 $\text{fb}^{-1}$, collected with the BESIII detector at center-of-mass energies between 4.600 and 4.699 GeV, we report the first measurements of the absolute branching fractions $\mathcal{B}(Λ_c^+\to pK_{L}^{0})=(1.67 \pm 0.06 \pm 0. 04)\%$, $\mathcal{B}(Λ_c^+\to pK_{L}^{0}π^+π^-)=(1.69 \pm 0.10 \pm 0.05)\%$, and $\mathcal{B}(Λ_c^+\to pK_{L}^{0}π^0)=(2.02 \pm 0.13 \pm 0.05)\%$, where the first uncertainties are statistical and the second systematic. Combining with the known branching fractions of $Λ_c^+ \to pK_{S}^{0}$, $Λ_c^+ \to pK_{S}^{0}π^+π^-$, and $Λ_c^+ \to pK_{S}^{0}π^0$, we present the first measurements of the $K_{S}^{0}$-$K_{L}^{0}$ asymmetries $R(Λ_c^+, K_{S,L}^0X) = \frac{\mathcal{B}(Λ_c^+ \to K_{S}^{0} X) - \mathcal{B}(Λ_c^+ \to K_{L}^{0} X)}{\mathcal{B}(Λ_c^+ \to K_{S}^{0} X) + \mathcal{B}(Λ_c^+ \to K_{L}^{0} X)}$ in charmed baryon decays: $R(Λ_c^+, pK_{S,L}^0) = -0.025 \pm 0.031$, $R(Λ_c^+, pK_{S,L}^0π^+π^-) = -0.027 \pm 0.048$, and $R(Λ_c^+, pK_{S,L}^0π^0) =-0.015 \pm 0.046$. No significant asymmetries within the uncertainties are observed.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Parameter Estimation for the Complex Fractional Ornstein-Uhlenbeck Processes with Hurst parameter H \in (0, 1/2)
Authors:
Fares Alazemi,
Abdulaziz Alsenafi,
Yong Chen,
Hongjuan Zhou
Abstract:
We study the strong consistency and asymptotic normality of a least squares estimator of the drift coefficient in complex-valued Ornstein-Uhlenbeck processes driven by fractional Brownian motion, extending the results of Chen, Hu, Wang (2017) to the case of Hurst parameter H \in (1/4 , 1/2) and the results of Hu, Nualart, Zhou (2019) to a two-dimensional case. When H \in (0, 1/4], it is found that…
▽ More
We study the strong consistency and asymptotic normality of a least squares estimator of the drift coefficient in complex-valued Ornstein-Uhlenbeck processes driven by fractional Brownian motion, extending the results of Chen, Hu, Wang (2017) to the case of Hurst parameter H \in (1/4 , 1/2) and the results of Hu, Nualart, Zhou (2019) to a two-dimensional case. When H \in (0, 1/4], it is found that the integrand of the estimator is not in the domain of the standard divergence operator. To facilitate the proofs, we develop a new inner product formula for functions of bounded variation in the reproducing kernel Hilbert space of fractional Brownian motion with Hurst parameter H \in (0, 1/2). This formula is also applied to obtain the second moments of the so-called α-order fractional Brownian motion and the α-fractional bridges with the Hurst parameter H \in (0, 1/2).
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Enabling Regional Explainability by Automatic and Model-agnostic Rule Extraction
Authors:
Yu Chen,
Tianyu Cui,
Alexander Capstick,
Nan Fletcher-Loyd,
Payam Barnaghi
Abstract:
In Explainable AI, rule extraction translates model knowledge into logical rules, such as IF-THEN statements, crucial for understanding patterns learned by black-box models. This could significantly aid in fields like disease diagnosis, disease progression estimation, or drug discovery. However, such application domains often contain imbalanced data, with the class of interest underrepresented. Ex…
▽ More
In Explainable AI, rule extraction translates model knowledge into logical rules, such as IF-THEN statements, crucial for understanding patterns learned by black-box models. This could significantly aid in fields like disease diagnosis, disease progression estimation, or drug discovery. However, such application domains often contain imbalanced data, with the class of interest underrepresented. Existing methods inevitably compromise the performance of rules for the minor class to maximise the overall performance. As the first attempt in this field, we propose a model-agnostic approach for extracting rules from specific subgroups of data, featuring automatic rule generation for numerical features. This method enhances the regional explainability of machine learning models and offers wider applicability compared to existing methods. We additionally introduce a new method for selecting features to compose rules, reducing computational costs in high-dimensional spaces. Experiments across various datasets and models demonstrate the effectiveness of our methods.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Practical identifiability and parameter estimation of compartmental epidemiological models
Authors:
Q. Y. Chen,
Z. Rapti,
Y. Drossinos,
J. Cuevas-Maraver,
G. A. Kevrekidis,
P. G. Kevrekidis
Abstract:
Practical parameter identifiability in ODE-based epidemiological models is a known issue, yet one that merits further study. It is essentially ubiquitous due to noise and errors in real data. In this study, to avoid uncertainty stemming from data of unknown quality, simulated data with added noise are used to investigate practical identifiability in two distinct epidemiological models. Particular…
▽ More
Practical parameter identifiability in ODE-based epidemiological models is a known issue, yet one that merits further study. It is essentially ubiquitous due to noise and errors in real data. In this study, to avoid uncertainty stemming from data of unknown quality, simulated data with added noise are used to investigate practical identifiability in two distinct epidemiological models. Particular emphasis is placed on the role of initial conditions, which are assumed unknown, except those that are directly measured. Instead of just focusing on one method of estimation, we use and compare results from various broadly used methods, including maximum likelihood and Markov Chain Monte Carlo (MCMC) estimation.
Among other findings, our analysis revealed that the MCMC estimator is overall more robust than the point estimators considered. Its estimates and predictions are improved when the initial conditions of certain compartments are fixed so that the model becomes globally identifiable. For the point estimators, whether fixing or fitting the that are not directly measured improves parameter estimates is model-dependent. Specifically, in the standard SEIR model, fixing the initial condition for the susceptible population S(0) improved parameter estimates, while this was not true when fixing the initial condition of the asymptomatic population in a more involved model. Our study corroborates the change in quality of parameter estimates upon usage of pre-peak or post-peak time-series under consideration. Finally, our examples suggest that in the presence of significantly noisy data, the value of structural identifiability is moot.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Fully heavy tetraquark resonant states with different flavors
Authors:
Wei-Lin Wu,
Yao Ma,
Yan-Ke Chen,
Lu Meng,
Shi-Lin Zhu
Abstract:
We use the quark potential model to calculate the mass spectrum of the S-wave fully heavy tetraquark systems with different flavors, including the $ bc\bar b\bar c, bb\bar c\bar c, cc\bar c\bar b $ and $ bb\bar b\bar c $ systems. We employ the Gaussian expansion method to solve the four-body Schrödinger equation, and the complex scaling method to identify resonant states. The…
▽ More
We use the quark potential model to calculate the mass spectrum of the S-wave fully heavy tetraquark systems with different flavors, including the $ bc\bar b\bar c, bb\bar c\bar c, cc\bar c\bar b $ and $ bb\bar b\bar c $ systems. We employ the Gaussian expansion method to solve the four-body Schrödinger equation, and the complex scaling method to identify resonant states. The $ bc\bar b\bar c, bb\bar c\bar c, cc\bar c\bar b $ and $ bb\bar b\bar c $ resonant states are obtained in the mass regions of $ (13.2,13.5) $, $ (13.3,13.6) $, $ (10.0,10.3) $, $ (16.5,16.7) $ GeV, respectively. Among these states, the $ bc\bar b\bar c $ tetraquark states are the most promising ones to be discovered in the near future. We recommend the experimental exploration of the $ 1^{++} $ and $ 2^{++} $ $ bc\bar b\bar c $ states with masses near $ 13.3 $ GeV in the $ J/ψΥ$ channel. From the root-mean-square radii, we find that all the resonant states we have identified are compact tetraquark states.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
A response to commenter Ke Lan's comment on our paper published in Nature Communications (2023)14:5782 by J. Yan et al
Authors:
Ji Yan,
Jiwei Li,
X. T. He,
Lifeng Wang,
Yaohua Chen,
Feng Wang,
Xiaoying Han,
Kaiqiang Pan,
Juxi Liang,
Yulong Li,
Zanyang Guan,
Xiangming Liu,
Xingsen Che,
Zhong**g Chen,
Xing Zhang,
Yan Xu,
Bin Li,
Minging He,
Hongbo Cai,
Liang. Hao,
Zhanjun Liu,
Chunyang Zheng,
Zhensheng Dai,
Zhengfeng Fan,
Bin Qiao
, et al. (4 additional authors not shown)
Abstract:
A response to commenter Ke Lan's comment on our paper published in Nature Communications (2023)14:5782 by J. Yan et al
A response to commenter Ke Lan's comment on our paper published in Nature Communications (2023)14:5782 by J. Yan et al
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Study of the $f_{0}(980)$ through the decay $D_{s}^{+}\rightarrow π^{+}π^{+}π^{-}π^{0}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (649 additional authors not shown)
Abstract:
We perform the first amplitude analysis of $D^+_s \to π^+π^+π^-π^0$ decays, based on data samples of electron-positron collisions recorded with the BESIII detector at center-of-mass energies between 4.128 and 4.226 GeV, corresponding to an integrated luminosity of 7.33~fb$^{-1}$. We report the observation of $D_{s}^{+} \to f_0(980)ρ(770)^{+}$ with a statistical significance greater than 10$σ$ and…
▽ More
We perform the first amplitude analysis of $D^+_s \to π^+π^+π^-π^0$ decays, based on data samples of electron-positron collisions recorded with the BESIII detector at center-of-mass energies between 4.128 and 4.226 GeV, corresponding to an integrated luminosity of 7.33~fb$^{-1}$. We report the observation of $D_{s}^{+} \to f_0(980)ρ(770)^{+}$ with a statistical significance greater than 10$σ$ and determine the branching fractions $\mathcal{B}(D_s^+\toπ^+π^+π^-π^0|_{{\rm non}-η})=(2.04\pm0.08_{\rm stat.}\pm0.05_{\rm syst.})\%$ and $\mathcal{B}(D_s^+\toηπ^+)=(1.56\pm0.09_{\rm stat.}\pm0.04_{\rm syst.})\%$. Moreover, we measure the relative branching fraction between $φ\toπ^+π^-π^0$ and $φ\to K^+K^-$ to be $\frac{\mathcal{B}(φ(1020) \to π^+π^-π^0)}{\mathcal{B}(φ(1020) \to K^+K^-)}=0.230 \pm 0.014_{\rm stat.} \pm 0.010_{\rm syst.}$, which deviates from the world average value by more than $4σ$.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Phonon Heat Transport and Anisotropic Tuning of Quantum Fluctuations in a Frustrated Honeycomb Magnet
Authors:
Haoran Fan,
Yue Chen,
Yuchen Gu,
Yuan Li,
Xi Lin
Abstract:
Honeycomb cobalt oxides containing 3$\it{d}$ Co$^{2+}$ ions might realize frustrated magnetism and novel quantum phases. Among candidate materials, Na$_{3}$Co$_{2}$SbO$_{6}$ stands out for its distorted honeycomb lattice and significant in-plane anisotropy, motivating vector-field tuning inside the honeycomb plane. Here we use thermal transport down to the mK regime to study twin-free crystals of…
▽ More
Honeycomb cobalt oxides containing 3$\it{d}$ Co$^{2+}$ ions might realize frustrated magnetism and novel quantum phases. Among candidate materials, Na$_{3}$Co$_{2}$SbO$_{6}$ stands out for its distorted honeycomb lattice and significant in-plane anisotropy, motivating vector-field tuning inside the honeycomb plane. Here we use thermal transport down to the mK regime to study twin-free crystals of Na$_{3}$Co$_{2}$SbO$_{6}$ subject to in-plane vector fields. We find that the thermal conductivity $κ$ never exceeds the heat-transport capability of phonons, rendering its suppression primarily due to phonon scattering off magnetic excitations and/or domain boundaries. The system's field-driven quantum criticality manifests itself as an abundance of magnetic fluctuations hindering the heat transport, which further depends on the field direction in an intriguing manner.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Dual-Space Knowledge Distillation for Large Language Models
Authors:
Songming Zhang,
Xue Zhang,
Zengkui Sun,
Yufeng Chen,
**an Xu
Abstract:
Knowledge distillation (KD) is known as a promising solution to compress large language models (LLMs) via transferring their knowledge to smaller models. During this process, white-box KD methods usually minimize the distance between the output distributions of the two models so that more knowledge can be transferred. However, in the current white-box KD framework, the output distributions are fro…
▽ More
Knowledge distillation (KD) is known as a promising solution to compress large language models (LLMs) via transferring their knowledge to smaller models. During this process, white-box KD methods usually minimize the distance between the output distributions of the two models so that more knowledge can be transferred. However, in the current white-box KD framework, the output distributions are from the respective output spaces of the two models, using their own prediction heads. We argue that the space discrepancy will lead to low similarity between the teacher model and the student model on both representation and distribution levels. Furthermore, this discrepancy also hinders the KD process between models with different vocabularies, which is common for current LLMs. To address these issues, we propose a dual-space knowledge distillation (DSKD) framework that unifies the output spaces of the two models for KD. On the basis of DSKD, we further develop a cross-model attention mechanism, which can automatically align the representations of the two models with different vocabularies. Thus, our framework is not only compatible with various distance functions for KD (e.g., KL divergence) like the current framework, but also supports KD between any two LLMs regardless of their vocabularies. Experiments on task-agnostic instruction-following benchmarks show that DSKD significantly outperforms the current white-box KD framework with various distance functions, and also surpasses existing KD methods for LLMs with different vocabularies.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Efficient, Multimodal, and Derivative-Free Bayesian Inference With Fisher-Rao Gradient Flows
Authors:
Yifan Chen,
Daniel Zhengyu Huang,
Jiaoyang Huang,
Sebastian Reich,
Andrew M. Stuart
Abstract:
In this paper, we study efficient approximate sampling for probability distributions known up to normalization constants. We specifically focus on a problem class arising in Bayesian inference for large-scale inverse problems in science and engineering applications. The computational challenges we address with the proposed methodology are: (i) the need for repeated evaluations of expensive forward…
▽ More
In this paper, we study efficient approximate sampling for probability distributions known up to normalization constants. We specifically focus on a problem class arising in Bayesian inference for large-scale inverse problems in science and engineering applications. The computational challenges we address with the proposed methodology are: (i) the need for repeated evaluations of expensive forward models; (ii) the potential existence of multiple modes; and (iii) the fact that gradient of, or adjoint solver for, the forward model might not be feasible.
While existing Bayesian inference methods meet some of these challenges individually, we propose a framework that tackles all three systematically. Our approach builds upon the Fisher-Rao gradient flow in probability space, yielding a dynamical system for probability densities that converges towards the target distribution at a uniform exponential rate. This rapid convergence is advantageous for the computational burden outlined in (i). We apply Gaussian mixture approximations with operator splitting techniques to simulate the flow numerically; the resulting approximation can capture multiple modes thus addressing (ii). Furthermore, we employ the Kalman methodology to facilitate a derivative-free update of these Gaussian components and their respective weights, addressing the issue in (iii).
The proposed methodology results in an efficient derivative-free sampler flexible enough to handle multi-modal distributions: Gaussian Mixture Kalman Inversion (GMKI). The effectiveness of GMKI is demonstrated both theoretically and numerically in several experiments with multimodal target distributions, including proof-of-concept and two-dimensional examples, as well as a large-scale application: recovering the Navier-Stokes initial condition from solution data at positive times.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
TopoGCL: Topological Graph Contrastive Learning
Authors:
Yuzhou Chen,
Jose Frias,
Yulia R. Gel
Abstract:
Graph contrastive learning (GCL) has recently emerged as a new concept which allows for capitalizing on the strengths of graph neural networks (GNNs) to learn rich representations in a wide variety of applications which involve abundant unlabeled information. However, existing GCL approaches largely tend to overlook the important latent information on higher-order graph substructures. We address t…
▽ More
Graph contrastive learning (GCL) has recently emerged as a new concept which allows for capitalizing on the strengths of graph neural networks (GNNs) to learn rich representations in a wide variety of applications which involve abundant unlabeled information. However, existing GCL approaches largely tend to overlook the important latent information on higher-order graph substructures. We address this limitation by introducing the concepts of topological invariance and extended persistence on graphs to GCL. In particular, we propose a new contrastive mode which targets topological representations of the two augmented views from the same graph, yielded by extracting latent shape properties of the graph at multiple resolutions. Along with the extended topological layer, we introduce a new extended persistence summary, namely, extended persistence landscapes (EPL) and derive its theoretical stability guarantees. Our extensive numerical results on biological, chemical, and social interaction graphs show that the new Topological Graph Contrastive Learning (TopoGCL) model delivers significant performance gains in unsupervised graph classification for 11 out of 12 considered datasets and also exhibits robustness under noisy scenarios.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
CogMG: Collaborative Augmentation Between Large Language Model and Knowledge Graph
Authors:
Tong Zhou,
Yubo Chen,
Kang Liu,
Jun Zhao
Abstract:
Large language models have become integral to question-answering applications despite their propensity for generating hallucinations and factually inaccurate content. Querying knowledge graphs to reduce hallucinations in LLM meets the challenge of incomplete knowledge coverage in knowledge graphs. On the other hand, updating knowledge graphs by information extraction and knowledge graph completion…
▽ More
Large language models have become integral to question-answering applications despite their propensity for generating hallucinations and factually inaccurate content. Querying knowledge graphs to reduce hallucinations in LLM meets the challenge of incomplete knowledge coverage in knowledge graphs. On the other hand, updating knowledge graphs by information extraction and knowledge graph completion faces the knowledge update misalignment issue. In this work, we introduce a collaborative augmentation framework, CogMG, leveraging knowledge graphs to address the limitations of LLMs in QA scenarios, explicitly targeting the problems of incomplete knowledge coverage and knowledge update misalignment. The LLMs identify and decompose required knowledge triples that are not present in the KG, enriching them and aligning updates with real-world demands. We demonstrate the efficacy of this approach through a supervised fine-tuned LLM within an agent framework, showing significant improvements in reducing hallucinations and enhancing factual accuracy in QA responses. Our code and video are publicly available.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.