-
KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches
Authors:
Jiayi Yuan,
Hongyi Liu,
Shaochen,
Zhong,
Yu-Neng Chuang,
Songchen Li,
Guanchu Wang,
Duy Le,
Hongye **,
Vipin Chaudhary,
Zhaozhuo Xu,
Zirui Liu,
Xia Hu
Abstract:
Long context capability is a crucial competency for large language models (LLMs) as it mitigates the human struggle to digest long-form texts. This capability enables complex task-solving scenarios such as book summarization, code assistance, and many more tasks that are traditionally manpower-intensive. However, transformer-based LLMs face significant challenges with long context input due to the…
▽ More
Long context capability is a crucial competency for large language models (LLMs) as it mitigates the human struggle to digest long-form texts. This capability enables complex task-solving scenarios such as book summarization, code assistance, and many more tasks that are traditionally manpower-intensive. However, transformer-based LLMs face significant challenges with long context input due to the growing size of the KV cache and the intrinsic complexity of attending to extended inputs; where multiple schools of efficiency-driven approaches -- such as KV cache quantization, token drop**, prompt compression, linear-time sequence models, and hybrid architectures -- have been proposed to produce efficient yet long context-capable models. Despite these advancements, no existing work has comprehensively benchmarked these methods in a reasonably aligned environment. In this work, we fill this gap by providing a taxonomy of current methods and evaluating 10+ state-of-the-art approaches across seven categories of long context tasks. Our work reveals numerous previously unknown phenomena and offers insights -- as well as a friendly workbench -- for the future development of long context-capable LLMs. The source code will be available at https://github.com/henryzhongsc/longctx_bench
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Enhancing the Capability and Robustness of Large Language Models through Reinforcement Learning-Driven Query Refinement
Authors:
Zisu Huang,
Xiaohua Wang,
Feiran Zhang,
Zhibo Xu,
Cenyuan Zhang,
Xiaoqing Zheng,
Xuan**g Huang
Abstract:
The capacity of large language models (LLMs) to generate honest, harmless, and helpful responses heavily relies on the quality of user prompts. However, these prompts often tend to be brief and vague, thereby significantly limiting the full potential of LLMs. Moreover, harmful prompts can be meticulously crafted and manipulated by adversaries to jailbreak LLMs, inducing them to produce potentially…
▽ More
The capacity of large language models (LLMs) to generate honest, harmless, and helpful responses heavily relies on the quality of user prompts. However, these prompts often tend to be brief and vague, thereby significantly limiting the full potential of LLMs. Moreover, harmful prompts can be meticulously crafted and manipulated by adversaries to jailbreak LLMs, inducing them to produce potentially toxic content. To enhance the capabilities of LLMs while maintaining strong robustness against harmful jailbreak inputs, this study proposes a transferable and pluggable framework that refines user prompts before they are input into LLMs. This strategy improves the quality of the queries, empowering LLMs to generate more truthful, benign and useful responses. Specifically, a lightweight query refinement model is introduced and trained using a specially designed reinforcement learning approach that incorporates multiple objectives to enhance particular capabilities of LLMs. Extensive experiments demonstrate that the refinement model not only improves the quality of responses but also strengthens their robustness against jailbreak attacks. Code is available at: https://github.com/Huangzisu/query-refinement .
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Preserving Relative Localization of FoV-Limited Drone Swarm via Active Mutual Observation
Authors:
Lianjie Guo,
Zaitian Gongye,
Ziyi Xu,
Yingjian Wang,
Xin Zhou,
**ni Zhou,
Fei Gao
Abstract:
Relative state estimation is crucial for vision-based swarms to estimate and compensate for the unavoidable drift of visual odometry. For autonomous drones equipped with the most compact sensor setting -- a stereo camera that provides a limited field of view (FoV), the demand for mutual observation for relative state estimation conflicts with the demand for environment observation. To balance the…
▽ More
Relative state estimation is crucial for vision-based swarms to estimate and compensate for the unavoidable drift of visual odometry. For autonomous drones equipped with the most compact sensor setting -- a stereo camera that provides a limited field of view (FoV), the demand for mutual observation for relative state estimation conflicts with the demand for environment observation. To balance the two demands for FoV limited swarms by acquiring mutual observations with a safety guarantee, this paper proposes an active localization correction system, which plans camera orientations via a yaw planner during the flight. The yaw planner manages the contradiction by calculating suitable timing and yaw angle commands based on the evaluation of localization uncertainty estimated by the Kalman Filter. Simulation validates the scalability of our algorithm. In real-world experiments, we reduce positioning drift by up to 65% and managed to maintain a given formation in both indoor and outdoor GPS-denied flight, from which the accuracy, efficiency, and robustness of the proposed system are verified.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Searching for Best Practices in Retrieval-Augmented Generation
Authors:
Xiaohua Wang,
Zhenghua Wang,
Xuan Gao,
Feiran Zhang,
Yixin Wu,
Zhibo Xu,
Tianyuan Shi,
Zhengyuan Wang,
Shizheng Li,
Qi Qian,
Ruicheng Yin,
Changze Lv,
Xiaoqing Zheng,
Xuan**g Huang
Abstract:
Retrieval-augmented generation (RAG) techniques have proven to be effective in integrating up-to-date information, mitigating hallucinations, and enhancing response quality, particularly in specialized domains. While many RAG approaches have been proposed to enhance large language models through query-dependent retrievals, these approaches still suffer from their complex implementation and prolong…
▽ More
Retrieval-augmented generation (RAG) techniques have proven to be effective in integrating up-to-date information, mitigating hallucinations, and enhancing response quality, particularly in specialized domains. While many RAG approaches have been proposed to enhance large language models through query-dependent retrievals, these approaches still suffer from their complex implementation and prolonged response times. Typically, a RAG workflow involves multiple processing steps, each of which can be executed in various ways. Here, we investigate existing RAG approaches and their potential combinations to identify optimal RAG practices. Through extensive experiments, we suggest several strategies for deploying RAG that balance both performance and efficiency. Moreover, we demonstrate that multimodal retrieval techniques can significantly enhance question-answering capabilities about visual inputs and accelerate the generation of multimodal content using a "retrieval as generation" strategy.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
IBSEN: Director-Actor Agent Collaboration for Controllable and Interactive Drama Script Generation
Authors:
Senyu Han,
Lu Chen,
Li-Min Lin,
Zhengshan Xu,
Kai Yu
Abstract:
Large language models have demonstrated their capabilities in storyline creation and human-like character role-playing. Current language model agents mainly focus on reasonable behaviors from the level of individuals, and their behaviors might be hard to constraint on the level of the whole storyline. In this paper we introduce IBSEN, a director-actor coordinate agent framework that generates dram…
▽ More
Large language models have demonstrated their capabilities in storyline creation and human-like character role-playing. Current language model agents mainly focus on reasonable behaviors from the level of individuals, and their behaviors might be hard to constraint on the level of the whole storyline. In this paper we introduce IBSEN, a director-actor coordinate agent framework that generates drama scripts and makes the plot played by agents more controllable. The director agent writes plot outlines that the user desires to see, instructs the actor agents to role-play their characters, and reschedules the plot when human players participate in the scenario to ensure the plot is progressing towards the objective. To evaluate the framework, we create a novel drama plot that involves several actor agents and check the interactions between them under the instruction of the director agent. Evaluation results show that our framework could generate complete, diverse drama scripts from only a rough outline of plot objectives, meanwhile maintaining the characteristics of characters in the drama. Our codes and prompts are available at https://github.com/OpenDFM/ibsen.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
FairMedFM: Fairness Benchmarking for Medical Imaging Foundation Models
Authors:
Ruinan **,
Zikang Xu,
Yuan Zhong,
Qiongsong Yao,
Qi Dou,
S. Kevin Zhou,
Xiaoxiao Li
Abstract:
The advent of foundation models (FMs) in healthcare offers unprecedented opportunities to enhance medical diagnostics through automated classification and segmentation tasks. However, these models also raise significant concerns about their fairness, especially when applied to diverse and underrepresented populations in healthcare applications. Currently, there is a lack of comprehensive benchmark…
▽ More
The advent of foundation models (FMs) in healthcare offers unprecedented opportunities to enhance medical diagnostics through automated classification and segmentation tasks. However, these models also raise significant concerns about their fairness, especially when applied to diverse and underrepresented populations in healthcare applications. Currently, there is a lack of comprehensive benchmarks, standardized pipelines, and easily adaptable libraries to evaluate and understand the fairness performance of FMs in medical imaging, leading to considerable challenges in formulating and implementing solutions that ensure equitable outcomes across diverse patient populations. To fill this gap, we introduce FairMedFM, a fairness benchmark for FM research in medical imaging.FairMedFM integrates with 17 popular medical imaging datasets, encompassing different modalities, dimensionalities, and sensitive attributes. It explores 20 widely used FMs, with various usages such as zero-shot learning, linear probing, parameter-efficient fine-tuning, and prompting in various downstream tasks -- classification and segmentation. Our exhaustive analysis evaluates the fairness performance over different evaluation metrics from multiple perspectives, revealing the existence of bias, varied utility-fairness trade-offs on different FMs, consistent disparities on the same datasets regardless FMs, and limited effectiveness of existing unfairness mitigation methods. Checkout FairMedFM's project page and open-sourced codebase, which supports extendible functionalities and applications as well as inclusive for studies on FMs in medical imaging over the long term.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
CLEME2.0: Towards More Interpretable Evaluation by Disentangling Edits for Grammatical Error Correction
Authors:
**gheng Ye,
Zishan Xu,
Yinghui Li,
Xuxin Cheng,
Linlin Song,
Qingyu Zhou,
Hai-Tao Zheng,
Ying Shen,
Xin Su
Abstract:
The paper focuses on improving the interpretability of Grammatical Error Correction (GEC) metrics, which receives little attention in previous studies. To bridge the gap, we propose CLEME2.0, a reference-based evaluation strategy that can describe four elementary dimensions of GEC systems, namely hit-correction, error-correction, under-correction, and over-correction. They collectively contribute…
▽ More
The paper focuses on improving the interpretability of Grammatical Error Correction (GEC) metrics, which receives little attention in previous studies. To bridge the gap, we propose CLEME2.0, a reference-based evaluation strategy that can describe four elementary dimensions of GEC systems, namely hit-correction, error-correction, under-correction, and over-correction. They collectively contribute to revealing the critical characteristics and locating drawbacks of GEC systems. Evaluating systems by Combining these dimensions leads to high human consistency over other reference-based and reference-less metrics. Extensive experiments on 2 human judgement datasets and 6 reference datasets demonstrate the effectiveness and robustness of our method. All the codes will be released after the peer review.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
EXCGEC: A Benchmark of Edit-wise Explainable Chinese Grammatical Error Correction
Authors:
**gheng Ye,
Shang Qin,
Yinghui Li,
Xuxin Cheng,
Libo Qin,
Hai-Tao Zheng,
Peng Xing,
Zishan Xu,
Guo Cheng,
Zhao Wei
Abstract:
Existing studies explore the explainability of Grammatical Error Correction (GEC) in a limited scenario, where they ignore the interaction between corrections and explanations. To bridge the gap, this paper introduces the task of EXplainable GEC (EXGEC), which focuses on the integral role of both correction and explanation tasks. To facilitate the task, we propose EXCGEC, a tailored benchmark for…
▽ More
Existing studies explore the explainability of Grammatical Error Correction (GEC) in a limited scenario, where they ignore the interaction between corrections and explanations. To bridge the gap, this paper introduces the task of EXplainable GEC (EXGEC), which focuses on the integral role of both correction and explanation tasks. To facilitate the task, we propose EXCGEC, a tailored benchmark for Chinese EXGEC consisting of 8,216 explanation-augmented samples featuring the design of hybrid edit-wise explanations. We benchmark several series of LLMs in multiple settings, covering post-explaining and pre-explaining. To promote the development of the task, we introduce a comprehensive suite of automatic metrics and conduct human evaluation experiments to demonstrate the human consistency of the automatic metrics for free-text explanations. All the codes and data will be released after the review.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
An Interpretable Alternative to Neural Representation Learning for Rating Prediction -- Transparent Latent Class Modeling of User Reviews
Authors:
Giuseppe Serra,
Peter Tino,
Zhao Xu,
Xin Yao
Abstract:
Nowadays, neural network (NN) and deep learning (DL) techniques are widely adopted in many applications, including recommender systems. Given the sparse and stochastic nature of collaborative filtering (CF) data, recent works have critically analyzed the effective improvement of neural-based approaches compared to simpler and often transparent algorithms for recommendation. Previous results showed…
▽ More
Nowadays, neural network (NN) and deep learning (DL) techniques are widely adopted in many applications, including recommender systems. Given the sparse and stochastic nature of collaborative filtering (CF) data, recent works have critically analyzed the effective improvement of neural-based approaches compared to simpler and often transparent algorithms for recommendation. Previous results showed that NN and DL models can be outperformed by traditional algorithms in many tasks. Moreover, given the largely black-box nature of neural-based methods, interpretable results are not naturally obtained. Following on this debate, we first present a transparent probabilistic model that topologically organizes user and product latent classes based on the review information. In contrast to popular neural techniques for representation learning, we readily obtain a statistical, visualization-friendly tool that can be easily inspected to understand user and product characteristics from a textual-based perspective. Then, given the limitations of common embedding techniques, we investigate the possibility of using the estimated interpretable quantities as model input for a rating prediction task. To contribute to the recent debates, we evaluate our results in terms of both capacity for interpretability and predictive performances in comparison with popular text-based neural approaches. The results demonstrate that the proposed latent class representations can yield competitive predictive performances, compared to popular, but difficult-to-interpret approaches.
△ Less
Submitted 17 June, 2024;
originally announced July 2024.
-
Supercharging Federated Learning with Flower and NVIDIA FLARE
Authors:
Holger R. Roth,
Daniel J. Beutel,
Yan Cheng,
Javier Fernandez Marques,
Heng Pan,
Chester Chen,
Zhihong Zhang,
Yuhong Wen,
Sean Yang,
Isaac,
Yang,
Yuan-Ting Hsieh,
Ziyue Xu,
Daguang Xu,
Nicholas D. Lane,
Andrew Feng
Abstract:
Several open-source systems, such as Flower and NVIDIA FLARE, have been developed in recent years while focusing on different aspects of federated learning (FL). Flower is dedicated to implementing a cohesive approach to FL, analytics, and evaluation. Over time, Flower has cultivated extensive strategies and algorithms tailored for FL application development, fostering a vibrant FL community in re…
▽ More
Several open-source systems, such as Flower and NVIDIA FLARE, have been developed in recent years while focusing on different aspects of federated learning (FL). Flower is dedicated to implementing a cohesive approach to FL, analytics, and evaluation. Over time, Flower has cultivated extensive strategies and algorithms tailored for FL application development, fostering a vibrant FL community in research and industry. Conversely, FLARE has prioritized the creation of an enterprise-ready, resilient runtime environment explicitly designed for FL applications in production environments. In this paper, we describe our initial integration of both frameworks and show how they can work together to supercharge the FL ecosystem as a whole. Through the seamless integration of Flower and FLARE, applications crafted within the Flower framework can effortlessly operate within the FLARE runtime environment without necessitating any modifications. This initial integration streamlines the process, eliminating complexities and ensuring smooth interoperability between the two platforms, thus enhancing the overall efficiency and accessibility of FL applications.
△ Less
Submitted 21 May, 2024;
originally announced July 2024.
-
StreamMOTP: Streaming and Unified Framework for Joint 3D Multi-Object Tracking and Trajectory Prediction
Authors:
Jiaheng Zhuang,
Guoan Wang,
Siyu Zhang,
Xiyang Wang,
Hangning Zhou,
Ziyao Xu,
Chi Zhang,
Zhiheng Li
Abstract:
3D multi-object tracking and trajectory prediction are two crucial modules in autonomous driving systems. Generally, the two tasks are handled separately in traditional paradigms and a few methods have started to explore modeling these two tasks in a joint manner recently. However, these approaches suffer from the limitations of single-frame training and inconsistent coordinate representations bet…
▽ More
3D multi-object tracking and trajectory prediction are two crucial modules in autonomous driving systems. Generally, the two tasks are handled separately in traditional paradigms and a few methods have started to explore modeling these two tasks in a joint manner recently. However, these approaches suffer from the limitations of single-frame training and inconsistent coordinate representations between tracking and prediction tasks. In this paper, we propose a streaming and unified framework for joint 3D Multi-Object Tracking and trajectory Prediction (StreamMOTP) to address the above challenges. Firstly, we construct the model in a streaming manner and exploit a memory bank to preserve and leverage the long-term latent features for tracked objects more effectively. Secondly, a relative spatio-temporal positional encoding strategy is introduced to bridge the gap of coordinate representations between the two tasks and maintain the pose-invariance for trajectory prediction. Thirdly, we further improve the quality and consistency of predicted trajectories with a dual-stream predictor. We conduct extensive experiments on popular nuSences dataset and the experimental results demonstrate the effectiveness and superiority of StreamMOTP, which outperforms previous methods significantly on both tasks. Furthermore, we also prove that the proposed framework has great potential and advantages in actual applications of autonomous driving.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
Involves averaging arithmetic and integral partial functions over sparse set
Authors:
Zhaoxi Ye,
Zhefeng Xu
Abstract:
Let $p$ be a prime number, $k\ge 0$ and $f$ be a class of arithmetic functions satisfying some simple conditions. In this short paper, we study the asymptotical behaviour of summation function…
▽ More
Let $p$ be a prime number, $k\ge 0$ and $f$ be a class of arithmetic functions satisfying some simple conditions. In this short paper, we study the asymptotical behaviour of summation function $$ψ_{f,k}(x):=\sum_{n\le x}Λ(n)\frac{f\left ( \left [ \frac{x}{n} \right ] \right ) }{\left [ \frac{x}{n} \right ]^{k} } ,~~~~~~~~~~~ π_{f,k}(x):=\sum_{p\le x}\frac{f\left ( \left [ \frac{x}{p} \right ] \right ) }{\left [ \frac{x}{p} \right ]^{k} } $$ as $x\to \infty $, where $\left [ \cdot \right ] $ is the integral part function, $Λ(n)$ is the von Mangoldt function.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
CUPID: Improving Battle Fairness and Position Satisfaction in Online MOBA Games with a Re-matchmaking System
Authors:
Ge Fan,
Chaoyun Zhang,
Kai Wang,
Yingjie Li,
Junyang Chen,
Zenglin Xu
Abstract:
The multiplayer online battle arena (MOBA) genre has gained significant popularity and economic success, attracting considerable research interest within the Human-Computer Interaction community. Enhancing the gaming experience requires a deep understanding of player behavior, and a crucial aspect of MOBA games is matchmaking, which aims to assemble teams of comparable skill levels. However, exist…
▽ More
The multiplayer online battle arena (MOBA) genre has gained significant popularity and economic success, attracting considerable research interest within the Human-Computer Interaction community. Enhancing the gaming experience requires a deep understanding of player behavior, and a crucial aspect of MOBA games is matchmaking, which aims to assemble teams of comparable skill levels. However, existing matchmaking systems often neglect important factors such as players' position preferences and team assignment, resulting in imbalanced matches and reduced player satisfaction. To address these limitations, this paper proposes a novel framework called CUPID, which introduces a novel process called ``re-matchmaking'' to optimize team and position assignments to improve both fairness and player satisfaction. CUPID incorporates a pre-filtering step to ensure a minimum level of matchmaking quality, followed by a pre-match win-rate prediction model that evaluates the fairness of potential assignments. By simultaneously considering players' position satisfaction and game fairness, CUPID aims to provide an enhanced matchmaking experience. Extensive experiments were conducted on two large-scale, real-world MOBA datasets to validate the effectiveness of CUPID. The results surpass all existing state-of-the-art baselines, with an average relative improvement of 7.18% in terms of win prediction accuracy. Furthermore, CUPID has been successfully deployed in a popular online mobile MOBA game. The deployment resulted in significant improvements in match fairness and player satisfaction, as evidenced by critical Human-Computer Interaction (HCI) metrics covering usability, accessibility, and engagement, observed through A/B testing. To the best of our knowledge, CUPID is the first re-matchmaking system designed specifically for large-scale MOBA games.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
MMRo: Are Multimodal LLMs Eligible as the Brain for In-Home Robotics?
Authors:
**ming Li,
Yichen Zhu,
Zhiyuan Xu,
**dong Gu,
Minjie Zhu,
Xin Liu,
Ning Liu,
Yaxin Peng,
Feifei Feng,
Jian Tang
Abstract:
It is fundamentally challenging for robots to serve as useful assistants in human environments because this requires addressing a spectrum of sub-problems across robotics, including perception, language understanding, reasoning, and planning. The recent advancements in Multimodal Large Language Models (MLLMs) have demonstrated their exceptional abilities in solving complex mathematical problems, m…
▽ More
It is fundamentally challenging for robots to serve as useful assistants in human environments because this requires addressing a spectrum of sub-problems across robotics, including perception, language understanding, reasoning, and planning. The recent advancements in Multimodal Large Language Models (MLLMs) have demonstrated their exceptional abilities in solving complex mathematical problems, mastering commonsense and abstract reasoning. This has led to the recent utilization of MLLMs as the brain in robotic systems, enabling these models to conduct high-level planning prior to triggering low-level control actions for task execution. However, it remains uncertain whether existing MLLMs are reliable in serving the brain role of robots. In this study, we introduce the first benchmark for evaluating Multimodal LLM for Robotic (MMRo) benchmark, which tests the capability of MLLMs for robot applications. Specifically, we identify four essential capabilities perception, task planning, visual reasoning, and safety measurement that MLLMs must possess to qualify as the robot's central processing unit. We have developed several scenarios for each capability, resulting in a total of 14 metrics for evaluation. We present experimental results for various MLLMs, including both commercial and open-source models, to assess the performance of existing systems. Our findings indicate that no single model excels in all areas, suggesting that current MLLMs are not yet trustworthy enough to serve as the cognitive core for robots. Our data can be found in https://mm-robobench.github.io/.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
Improved measurement of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (643 additional authors not shown)
Abstract:
Analyzing $e^+e^-$ collision data corresponding to an integrated luminosity of $7.33~\mathrm{fb}^{-1}$ collected at center-of-mass energies between 4.128 and 4.226~GeV with the BESIII detector, we measure the branching fraction of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$ to be $(2.98\pm0.23\pm0.12)\times10^{-3}$. The $D_s^+\to K^0$ hadronic form factor is determined from the differential dec…
▽ More
Analyzing $e^+e^-$ collision data corresponding to an integrated luminosity of $7.33~\mathrm{fb}^{-1}$ collected at center-of-mass energies between 4.128 and 4.226~GeV with the BESIII detector, we measure the branching fraction of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$ to be $(2.98\pm0.23\pm0.12)\times10^{-3}$. The $D_s^+\to K^0$ hadronic form factor is determined from the differential decay rate of $D^+_s\to K^0 e^+ν_e$ to be $f^{K^0}_+(0)=0.636\pm0.049\pm0.013$. For both measurements, the first uncertainty is statistical and the second systematic. The branching fraction and form factor measurements are factors of 1.6 and 1.7 more precise than the previous world averages, respectively.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
STBench: Assessing the Ability of Large Language Models in Spatio-Temporal Analysis
Authors:
Wenbin Li,
Di Yao,
Ruibo Zhao,
Wenjie Chen,
Zijie Xu,
Chengxue Luo,
Chang Gong,
Quanliang **g,
Haining Tan,
**g** Bi
Abstract:
The rapid evolution of large language models (LLMs) holds promise for reforming the methodology of spatio-temporal data mining. However, current works for evaluating the spatio-temporal understanding capability of LLMs are somewhat limited and biased. These works either fail to incorporate the latest language models or only focus on assessing the memorized spatio-temporal knowledge. To address thi…
▽ More
The rapid evolution of large language models (LLMs) holds promise for reforming the methodology of spatio-temporal data mining. However, current works for evaluating the spatio-temporal understanding capability of LLMs are somewhat limited and biased. These works either fail to incorporate the latest language models or only focus on assessing the memorized spatio-temporal knowledge. To address this gap, this paper dissects LLMs' capability of spatio-temporal data into four distinct dimensions: knowledge comprehension, spatio-temporal reasoning, accurate computation, and downstream applications. We curate several natural language question-answer tasks for each category and build the benchmark dataset, namely STBench, containing 13 distinct tasks and over 60,000 QA pairs. Moreover, we have assessed the capabilities of 13 LLMs, such as GPT-4o, Gemma and Mistral. Experimental results reveal that existing LLMs show remarkable performance on knowledge comprehension and spatio-temporal reasoning tasks, with potential for further enhancement on other tasks through in-context learning, chain-of-though prompting, and fine-tuning. The code and datasets of STBench are released on https://github.com/LwbXc/STBench.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
CMRxRecon2024: A Multi-Modality, Multi-View K-Space Dataset Boosting Universal Machine Learning for Accelerated Cardiac MRI
Authors:
Zi Wang,
Fanwen Wang,
Chen Qin,
Jun Lyu,
Ouyang Cheng,
Shuo Wang,
Yan Li,
Mengyao Yu,
Haoyu Zhang,
Kunyuan Guo,
Zhang Shi,
Qirong Li,
Ziqiang Xu,
Ya**g Zhang,
Hao Li,
Sha Hua,
Binghua Chen,
Longyu Sun,
Mengting Sun,
Qin Li,
Ying-Hua Chu,
Wenjia Bai,
**g Qin,
Xiahai Zhuang,
Claudia Prieto
, et al. (7 additional authors not shown)
Abstract:
Cardiac magnetic resonance imaging (MRI) has emerged as a clinically gold-standard technique for diagnosing cardiac diseases, thanks to its ability to provide diverse information with multiple modalities and anatomical views. Accelerated cardiac MRI is highly expected to achieve time-efficient and patient-friendly imaging, and then advanced image reconstruction approaches are required to recover h…
▽ More
Cardiac magnetic resonance imaging (MRI) has emerged as a clinically gold-standard technique for diagnosing cardiac diseases, thanks to its ability to provide diverse information with multiple modalities and anatomical views. Accelerated cardiac MRI is highly expected to achieve time-efficient and patient-friendly imaging, and then advanced image reconstruction approaches are required to recover high-quality, clinically interpretable images from undersampled measurements. However, the lack of publicly available cardiac MRI k-space dataset in terms of both quantity and diversity has severely hindered substantial technological progress, particularly for data-driven artificial intelligence. Here, we provide a standardized, diverse, and high-quality CMRxRecon2024 dataset to facilitate the technical development, fair evaluation, and clinical transfer of cardiac MRI reconstruction approaches, towards promoting the universal frameworks that enable fast and robust reconstructions across different cardiac MRI protocols in clinical practice. To the best of our knowledge, the CMRxRecon2024 dataset is the largest and most diverse publicly available cardiac k-space dataset. It is acquired from 330 healthy volunteers, covering commonly used modalities, anatomical views, and acquisition trajectories in clinical cardiac MRI workflows. Besides, an open platform with tutorials, benchmarks, and data processing tools is provided to facilitate data usage, advanced method development, and fair performance evaluation.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Evolution of Interfacial Hydration Structure Induced by Ion Condensation and Correlation Effects
Authors:
Han Li,
Zhi Xu,
Jiacheng Li,
Alessandro Siria,
Ming Ma
Abstract:
Interfacial hydration structures are crucial in wide-ranging applications, including battery, colloid, lubrication etc. Multivalent ions like Mg2+ and La3+ show irreplaceable roles in these applications, which are hypothesized due to their unique interfacial hydration structures. However, this hypothesis lacks experimental supports. Here, using three-dimensional atomic force microscopy (3D-AFM), w…
▽ More
Interfacial hydration structures are crucial in wide-ranging applications, including battery, colloid, lubrication etc. Multivalent ions like Mg2+ and La3+ show irreplaceable roles in these applications, which are hypothesized due to their unique interfacial hydration structures. However, this hypothesis lacks experimental supports. Here, using three-dimensional atomic force microscopy (3D-AFM), we provide the first observation for their interfacial hydration structures with molecular resolution. We observed the evolution of layered hydration structures at La(NO3)3 solution-mica interfaces with concentration. As concentration increases from 25 mM to 2 M, the layer number varies from 2 to 1 and back to 2, and the interlayer thickness rises from 0.25 to 0.34 nm, with hydration force increasing from 0.27+-0.07 to 1.04+-0.24 nN. Theory and molecular simulation reveal that multivalence induces concentration-dependent ion condensation and correlation effects, resulting in compositional and structural evolution within interfacial hydration structures. Additional experiments with MgCl2-mica, La(NO3)3-graphite and Al(NO3)3-mica interfaces together with literature comparison confirm the universality of this mechanism for both multivalent and monovalent ions. New factors affecting interfacial hydration structures are revealed, including concentration and solvent dielectric constant. This insight provides guidance for designing interfacial hydration structures to optimize solid-liquid-interphase for battery life extension, modulate colloid stability and develop efficient lubricants.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Stability and Robustness of Time-discretization Schemes for the Allen-Cahn Equation via Bifurcation and Perturbation Analysis
Authors:
Wenrui Hao,
Sun Lee,
Xiaofeng Xu,
Zhiliang Xu
Abstract:
The Allen-Cahn equation is a fundamental model for phase transitions, offering critical insights into the dynamics of interface evolution in various physical systems. This paper investigates the stability and robustness of frequently utilized time-discretization numerical schemes for solving the Allen-Cahn equation, with focuses on the Backward Euler, Crank-Nicolson (CN), convex splitting of modif…
▽ More
The Allen-Cahn equation is a fundamental model for phase transitions, offering critical insights into the dynamics of interface evolution in various physical systems. This paper investigates the stability and robustness of frequently utilized time-discretization numerical schemes for solving the Allen-Cahn equation, with focuses on the Backward Euler, Crank-Nicolson (CN), convex splitting of modified CN, and Diagonally Implicit Runge-Kutta (DIRK) methods. Our stability analysis reveals that the Convex Splitting of the Modified CN scheme exhibits unconditional stability, allowing greater flexibility in time step selection, while the other schemes are conditionally stable. Additionally, our robustness analysis highlights that the Backward Euler method converges to correct physical solutions regardless of initial conditions. In contrast, the other methods studied in this work show sensitivity to initial conditions and may converge to incorrect physical solutions if the initial conditions are not carefully chosen. This study introduces a comprehensive approach to assessing stability and robustness in numerical methods for solving the Allen-Cahn equation, providing a new perspective for evaluating numerical techniques for general nonlinear differential equations.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Measurement of the cross sections of $e^+e^-\to K^{-}\barΞ^{+}Λ/Σ^{0}$ at center-of-mass energies between 3.510 and 4.914 GeV
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (638 additional authors not shown)
Abstract:
Using $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at center-of-mass energies between 3.510 and 4.914GeV, corresponding to an integrated luminosity of 25 fb$^{-1}$, we measure the Born cross sections for the process $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$ at thirty-five energy points with a partial-reconstruction strategy. By fitting the dressed cross sections of…
▽ More
Using $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at center-of-mass energies between 3.510 and 4.914GeV, corresponding to an integrated luminosity of 25 fb$^{-1}$, we measure the Born cross sections for the process $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$ at thirty-five energy points with a partial-reconstruction strategy. By fitting the dressed cross sections of $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$, evidence for $ψ(4160) \to K^{-}\barΞ^{+}Λ$ is found for the first time with a significance of 4.4$σ$, including systematic uncertainties. No evidence for other possible resonances is found. In addition, the products of electronic partial width and branching fraction for all assumed resonances decaying into $K^{-}\barΞ^{+}Λ/Σ^{0}$ are determined.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Start from Zero: Triple Set Prediction for Automatic Knowledge Graph Completion
Authors:
Wen Zhang,
Ya**g Xu,
Peng Ye,
Zhiwei Huang,
Zezhong Xu,
Jiaoyan Chen,
Jeff Z. Pan,
Huajun Chen
Abstract:
Knowledge graph (KG) completion aims to find out missing triples in a KG. Some tasks, such as link prediction and instance completion, have been proposed for KG completion. They are triple-level tasks with some elements in a missing triple given to predict the missing element of the triple. However, knowing some elements of the missing triple in advance is not always a realistic setting. In this p…
▽ More
Knowledge graph (KG) completion aims to find out missing triples in a KG. Some tasks, such as link prediction and instance completion, have been proposed for KG completion. They are triple-level tasks with some elements in a missing triple given to predict the missing element of the triple. However, knowing some elements of the missing triple in advance is not always a realistic setting. In this paper, we propose a novel graph-level automatic KG completion task called Triple Set Prediction (TSP) which assumes none of the elements in the missing triples is given. TSP is to predict a set of missing triples given a set of known triples. To properly and accurately evaluate this new task, we propose 4 evaluation metrics including 3 classification metrics and 1 ranking metric, considering both the partial-open-world and the closed-world assumptions. Furthermore, to tackle the huge candidate triples for prediction, we propose a novel and efficient subgraph-based method GPHT that can predict the triple set fast. To fairly compare the TSP results, we also propose two types of methods RuleTensor-TSP and KGE-TSP applying the existing rule- and embedding-based methods for TSP as baselines. During experiments, we evaluate the proposed methods on two datasets extracted from Wikidata following the relation-similarity partial-open-world assumption proposed by us, and also create a complete family data set to evaluate TSP results following the closed-world assumption. Results prove that the methods can successfully generate a set of missing triples and achieve reasonable scores on the new task, and GPHT performs better than the baselines with significantly shorter prediction time. The datasets and code for experiments are available at https://github.com/zjukg/GPHT-for-TSP.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Delay Infectivity and Delay Recovery SIR model
Authors:
Christopher N. Angstmann,
Stuart-James M. Burney,
Anna V. McGann,
Zhuang Xu
Abstract:
We have derived the governing equations for an SIR model with delay terms in both the infectivity and recovery of the disease. The equations are derived by modelling the dynamics as a continuous time random walk, where individuals move between the classic SIR compartments. With an appropriate choice of distributions for the infectivity and recovery processes delay terms are introduced into the gov…
▽ More
We have derived the governing equations for an SIR model with delay terms in both the infectivity and recovery of the disease. The equations are derived by modelling the dynamics as a continuous time random walk, where individuals move between the classic SIR compartments. With an appropriate choice of distributions for the infectivity and recovery processes delay terms are introduced into the governing equations in a manner that ensures the physicality of the model. This provides novel insight into the underlying dynamics of an SIR model with time delays. The SIR model with delay infectivity and recovery allows for a more diverse range of dynamical behaviours. The model accounts for an incubation effect without the need to introduce new compartments.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Towards full instanton trans-series in Hofstadter's butterfly
Authors:
Jie Gu,
Zhaojie Xu
Abstract:
The trans-series completion of perturbative series of a wide class of quantum mechanical systems can be determined by combining the resurgence program and extra input coming from exact WKB analysis. In this paper, we reexamine the Harper-Hofstadter model and its spectrum, Hofstadter's butterfly, in light of recent developments. We demonstrate the connection between the perturbative energy series o…
▽ More
The trans-series completion of perturbative series of a wide class of quantum mechanical systems can be determined by combining the resurgence program and extra input coming from exact WKB analysis. In this paper, we reexamine the Harper-Hofstadter model and its spectrum, Hofstadter's butterfly, in light of recent developments. We demonstrate the connection between the perturbative energy series of the Harper-Hofstadter model and the vev of $1/2$-BPS Wilson loop of 5d SYM and clarify the differences between their non-perturbative corrections. Taking insights from the cosine potential model, we construct the full energy trans-series for flux $φ=2π/Q$ and provide numerical evidence with remarkably high precision. Finally, we revisit the problem of self-similarity of the butterfly and discuss the possibility of a completed version of the Rammal-Wilkinson formula.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Measurements of $K_S^0$-$K_L^0$ asymmetries in the decays $Λ_c^+ \to pK_{L,S}^0$, $pK_{L,S}^0π^+π^-$ and $pK_{L,S}^0π^0$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (643 additional authors not shown)
Abstract:
Using $e^+e^-$ annihilation data sets corresponding to an integrated luminosity of 4.5 $\text{fb}^{-1}$, collected with the BESIII detector at center-of-mass energies between 4.600 and 4.699 GeV, we report the first measurements of the absolute branching fractions $\mathcal{B}(Λ_c^+\to pK_{L}^{0})=(1.67 \pm 0.06 \pm 0. 04)\%$, $\mathcal{B}(Λ_c^+\to pK_{L}^{0}π^+π^-)=(1.69 \pm 0.10 \pm 0.05)\%$, an…
▽ More
Using $e^+e^-$ annihilation data sets corresponding to an integrated luminosity of 4.5 $\text{fb}^{-1}$, collected with the BESIII detector at center-of-mass energies between 4.600 and 4.699 GeV, we report the first measurements of the absolute branching fractions $\mathcal{B}(Λ_c^+\to pK_{L}^{0})=(1.67 \pm 0.06 \pm 0. 04)\%$, $\mathcal{B}(Λ_c^+\to pK_{L}^{0}π^+π^-)=(1.69 \pm 0.10 \pm 0.05)\%$, and $\mathcal{B}(Λ_c^+\to pK_{L}^{0}π^0)=(2.02 \pm 0.13 \pm 0.05)\%$, where the first uncertainties are statistical and the second systematic. Combining with the known branching fractions of $Λ_c^+ \to pK_{S}^{0}$, $Λ_c^+ \to pK_{S}^{0}π^+π^-$, and $Λ_c^+ \to pK_{S}^{0}π^0$, we present the first measurements of the $K_{S}^{0}$-$K_{L}^{0}$ asymmetries $R(Λ_c^+, K_{S,L}^0X) = \frac{\mathcal{B}(Λ_c^+ \to K_{S}^{0} X) - \mathcal{B}(Λ_c^+ \to K_{L}^{0} X)}{\mathcal{B}(Λ_c^+ \to K_{S}^{0} X) + \mathcal{B}(Λ_c^+ \to K_{L}^{0} X)}$ in charmed baryon decays: $R(Λ_c^+, pK_{S,L}^0) = -0.025 \pm 0.031$, $R(Λ_c^+, pK_{S,L}^0π^+π^-) = -0.027 \pm 0.048$, and $R(Λ_c^+, pK_{S,L}^0π^0) =-0.015 \pm 0.046$. No significant asymmetries within the uncertainties are observed.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Diagnosis Assistant for Liver Cancer Utilizing a Large Language Model with Three Types of Knowledge
Authors:
Xuzhou Wu,
Guangxin Li,
Xing Wang,
Zeyu Xu,
Yingni Wang,
Jianming Xian,
Xueyu Wang,
Gong Li,
Kehong Yuan
Abstract:
Liver cancer has a high incidence rate, but primary healthcare settings often lack experienced doctors. Advances in large models and AI technologies offer potential assistance. This work aims to address limitations in liver cancer diagnosis models, such as poor understanding of medical images, insufficient consideration of liver blood vessels, and ensuring accurate medical information. We propose…
▽ More
Liver cancer has a high incidence rate, but primary healthcare settings often lack experienced doctors. Advances in large models and AI technologies offer potential assistance. This work aims to address limitations in liver cancer diagnosis models, such as poor understanding of medical images, insufficient consideration of liver blood vessels, and ensuring accurate medical information. We propose a specialized diagnostic assistant to improve the diagnostic capabilities of less experienced doctors. Our framework combines large and small models, using optimized small models for precise patient image perception. Specifically, a segmentation network iteratively removes ambiguous pixels for liver tumor segmentation, and a multi-scale, multi-level differential network segments liver vessels. Features from these segmentations and medical records form a patient's personalized knowledge base. For diagnosis, Chain of Thought (COT) technology designs prompts mimicking experienced doctors' thought patterns, and Retrieval-Augmented Generation (RAG) technology provides answers based on reliable domain knowledge and trusted cases. Our small model methods improve liver tumor and vessel segmentation performance, resulting in more accurate information extraction. The large model component scores over 1 point higher on a 10-point scale in evaluations by doctors compared to control methods. Our method enhances semantic perception of medical images, improves classification of ambiguous pixels, and optimizes small object perception. It considers blood vessel positions for specific treatments and improves response credibility and interpretability by mimicking experienced doctors' thought processes using reliable resources. This approach has been recognized by doctors and benefits liver cancer auxiliary diagnosis.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Human-centered In-building Embodied Delivery Benchmark
Authors:
Zhuoqun Xu,
Yang Liu,
Xiaoqi Li,
Jiyao Zhang,
Hao Dong
Abstract:
Recently, the concept of embodied intelligence has been widely accepted and popularized, leading people to naturally consider the potential for commercialization in this field. In this work, we propose a specific commercial scenario simulation, human-centered in-building embodied delivery. Furthermore, for this scenario, we have developed a brand-new virtual environment system from scratch, constr…
▽ More
Recently, the concept of embodied intelligence has been widely accepted and popularized, leading people to naturally consider the potential for commercialization in this field. In this work, we propose a specific commercial scenario simulation, human-centered in-building embodied delivery. Furthermore, for this scenario, we have developed a brand-new virtual environment system from scratch, constructing a multi-level connected building space modeled after a polar research station. This environment also includes autonomous human characters and robots with gras** and mobility capabilities, as well as a large number of interactive items. Based on this environment, we have built a delivery dataset containing 13k language instructions to guide robots in providing services. We simulate human behavior through human characters and sample their various needs in daily life. Finally, we proposed a method centered around a large multimodal model to serve as the baseline system for this dataset. Compared to past embodied data work, our work focuses on a virtual environment centered around human-robot interaction for commercial scenarios. We believe this will bring new perspectives and exploration angles to the embodied community.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Local Spherical Collapsing Box in Athena++: Numerical Implementation and Benchmark Tests
Authors:
Ziyan Xu,
Elliot M. Lynch,
Guillaume Laibe
Abstract:
We implement a local model for a spherical collapsing/expanding gas cloud into the Athena++ magnetohydrodynamic code. This local model consists of a Cartesian periodic box with time-dependent geometry. We present a series of benchmark test problems, including non-linear solutions and linear perturbations of the local model, confirming the code's desired performance. During a spherical collapse, a…
▽ More
We implement a local model for a spherical collapsing/expanding gas cloud into the Athena++ magnetohydrodynamic code. This local model consists of a Cartesian periodic box with time-dependent geometry. We present a series of benchmark test problems, including non-linear solutions and linear perturbations of the local model, confirming the code's desired performance. During a spherical collapse, a horizontal shear flow is amplified, corresponding to angular momentum conservation of zonal flows in the global problem; wave speed and amplitude of sound waves increase in the local frame, due to the reduction in the characteristic length scale of the box, which can lead to an anisotropic effective sound speed in the local box. Our code conserves both mass and momentum to machine precision. This numerical implementation of the local model has potential applications to the study of local physics and hydrodynamic instabilities during protostellar collapse, providing a powerful framework for better understanding the earliest stages of star and planet formation.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Study of the $f_{0}(980)$ through the decay $D_{s}^{+}\rightarrow π^{+}π^{+}π^{-}π^{0}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (649 additional authors not shown)
Abstract:
We perform the first amplitude analysis of $D^+_s \to π^+π^+π^-π^0$ decays, based on data samples of electron-positron collisions recorded with the BESIII detector at center-of-mass energies between 4.128 and 4.226 GeV, corresponding to an integrated luminosity of 7.33~fb$^{-1}$. We report the observation of $D_{s}^{+} \to f_0(980)ρ(770)^{+}$ with a statistical significance greater than 10$σ$ and…
▽ More
We perform the first amplitude analysis of $D^+_s \to π^+π^+π^-π^0$ decays, based on data samples of electron-positron collisions recorded with the BESIII detector at center-of-mass energies between 4.128 and 4.226 GeV, corresponding to an integrated luminosity of 7.33~fb$^{-1}$. We report the observation of $D_{s}^{+} \to f_0(980)ρ(770)^{+}$ with a statistical significance greater than 10$σ$ and determine the branching fractions $\mathcal{B}(D_s^+\toπ^+π^+π^-π^0|_{{\rm non}-η})=(2.04\pm0.08_{\rm stat.}\pm0.05_{\rm syst.})\%$ and $\mathcal{B}(D_s^+\toηπ^+)=(1.56\pm0.09_{\rm stat.}\pm0.04_{\rm syst.})\%$. Moreover, we measure the relative branching fraction between $φ\toπ^+π^-π^0$ and $φ\to K^+K^-$ to be $\frac{\mathcal{B}(φ(1020) \to π^+π^-π^0)}{\mathcal{B}(φ(1020) \to K^+K^-)}=0.230 \pm 0.014_{\rm stat.} \pm 0.010_{\rm syst.}$, which deviates from the world average value by more than $4σ$.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Delay compartment models from a stochastic process
Authors:
Christopher N. Angstmann,
Anna V. McGann,
Zhuang Xu
Abstract:
Compartment models with delay terms are widely used across a range of disciplines. The motivation to include delay terms varies across different contexts. In epidemiological and pharmacokinetic models, the delays are often used to represent an incubation period. In this work, we derive a compartment model with delay terms from an underlying non-Markov stochastic process. Delay terms arise when wai…
▽ More
Compartment models with delay terms are widely used across a range of disciplines. The motivation to include delay terms varies across different contexts. In epidemiological and pharmacokinetic models, the delays are often used to represent an incubation period. In this work, we derive a compartment model with delay terms from an underlying non-Markov stochastic process. Delay terms arise when waiting times are drawn from a delay exponential distribution. This stochastic process approach allows us to preserve the physicality of the model, gaining understanding into the conditions under which delay terms can arise. By providing the conditions under which the delay exponential function is a probability distribution, we establish a critical value for the delay terms. An exact stochastic simulation method is introduced for the generalized model, enabling us to utilize the simulation in scenarios where intrinsic stochasticity is significant, such as when the population size is small. We illustrate the applications of the model and validate our simulation algorithm on examples drawn from epidemiology and pharmacokinetics.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Probing the nature of the $χ_{c1}(3872)$ state using radiative decays
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1094 additional authors not shown)
Abstract:
The radiative decays $χ_{c1}(3872)\rightarrowψ(2S)γ$ and $χ_{c1}(3872)\rightarrow J/ψγ$ are used to probe the~nature of the~$χ_{c1}(3872)$ state using proton-proton collision data collected with the LHCb detector, corresponding to an~integrated luminosity of~9fb$^{-1}$. Using the~$B^+\rightarrow χ_{c1}(3872)K^+$decay, the $χ_{c1}(3872)\rightarrow ψ(2S)γ$ process is observed for the first time and…
▽ More
The radiative decays $χ_{c1}(3872)\rightarrowψ(2S)γ$ and $χ_{c1}(3872)\rightarrow J/ψγ$ are used to probe the~nature of the~$χ_{c1}(3872)$ state using proton-proton collision data collected with the LHCb detector, corresponding to an~integrated luminosity of~9fb$^{-1}$. Using the~$B^+\rightarrow χ_{c1}(3872)K^+$decay, the $χ_{c1}(3872)\rightarrow ψ(2S)γ$ process is observed for the first time and the ratio of its partial width to that of the $χ_{c1}(3872)\rightarrow J/ψγ$ decay is measured to be $$ \frac{Γ_{χ_{c1}(3872)\rightarrow ψ(2S)γ}}
{Γ_{χ_{c1}(3872)\rightarrow J/ψγ}} = 1.67 \pm 0.21 \pm 0.12 \pm0.04 , $$ where the first uncertainty is statistical, the second systematic and the third is due to the uncertainties on the branching fractions of the $ψ(2S)$ and $J/ψ$ mesons. The measured ratio makes the interpretation of the $χ_{c1}(3872)$ state as a~pure $D^0\bar{D}^{*0}+\bar{D}^0D^{*0}$ molecule questionable and strongly indicates a sizeable compact charmonium or tetraquark component within the $χ_{c1}(3872)$ state.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
PVUW 2024 Challenge on Complex Video Understanding: Methods and Results
Authors:
Henghui Ding,
Chang Liu,
Yunchao Wei,
Nikhila Ravi,
Shuting He,
Song Bai,
Philip Torr,
Deshui Miao,
Xin Li,
Zhenyu He,
Yaowei Wang,
Ming-Hsuan Yang,
Zhensong Xu,
Jiangtao Yao,
Cheng**g Wu,
Ting Liu,
Luoqi Liu,
Xinyu Liu,
**g Zhang,
Kexin Zhang,
Yuting Yang,
Licheng Jiao,
Shuyuan Yang,
Mingqi Gao,
**gnan Luo
, et al. (12 additional authors not shown)
Abstract:
Pixel-level Video Understanding in the Wild Challenge (PVUW) focus on complex video understanding. In this CVPR 2024 workshop, we add two new tracks, Complex Video Object Segmentation Track based on MOSE dataset and Motion Expression guided Video Segmentation track based on MeViS dataset. In the two new tracks, we provide additional videos and annotations that feature challenging elements, such as…
▽ More
Pixel-level Video Understanding in the Wild Challenge (PVUW) focus on complex video understanding. In this CVPR 2024 workshop, we add two new tracks, Complex Video Object Segmentation Track based on MOSE dataset and Motion Expression guided Video Segmentation track based on MeViS dataset. In the two new tracks, we provide additional videos and annotations that feature challenging elements, such as the disappearance and reappearance of objects, inconspicuous small objects, heavy occlusions, and crowded environments in MOSE. Moreover, we provide a new motion expression guided video segmentation dataset MeViS to study the natural language-guided video understanding in complex environments. These new videos, sentences, and annotations enable us to foster the development of a more comprehensive and robust pixel-level understanding of video scenes in complex environments and realistic scenarios. The MOSE challenge had 140 registered teams in total, 65 teams participated the validation phase and 12 teams made valid submissions in the final challenge phase. The MeViS challenge had 225 registered teams in total, 50 teams participated the validation phase and 5 teams made valid submissions in the final challenge phase.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Dissipative Particle Dynamics and other particle methods for multiphase fluid flow in fractured and porous media
Authors:
Paul Meakin,
Zhijie Xu
Abstract:
Particle methods are less computationally efficient than grid based numerical solution of the Navier Stokes equation. However, they have important advantages including rigorous mass conservation, momentum conservation and isotropy. In addition, there is no need for explicit interface tracking/capturing and code development effort is relatively low. We describe applications of three particle method…
▽ More
Particle methods are less computationally efficient than grid based numerical solution of the Navier Stokes equation. However, they have important advantages including rigorous mass conservation, momentum conservation and isotropy. In addition, there is no need for explicit interface tracking/capturing and code development effort is relatively low. We describe applications of three particle methods: molecular dynamics, dissipative particle dynamics and smoothed particle hydrodynamics. The mesoscale (between the molecular and continuum scales) dissipative particle dynamics method can be used to simulate systems that are too large to simulate using molecular dynamics but small enough for thermal fluctuations to play an important role.
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
F-FOMAML: GNN-Enhanced Meta-Learning for Peak Period Demand Forecasting with Proxy Data
Authors:
Zexing Xu,
Linjun Zhang,
Sitan Yang,
Rasoul Etesami,
Hanghang Tong,
Huan Zhang,
Jiawei Han
Abstract:
Demand prediction is a crucial task for e-commerce and physical retail businesses, especially during high-stake sales events. However, the limited availability of historical data from these peak periods poses a significant challenge for traditional forecasting methods. In this paper, we propose a novel approach that leverages strategically chosen proxy data reflective of potential sales patterns f…
▽ More
Demand prediction is a crucial task for e-commerce and physical retail businesses, especially during high-stake sales events. However, the limited availability of historical data from these peak periods poses a significant challenge for traditional forecasting methods. In this paper, we propose a novel approach that leverages strategically chosen proxy data reflective of potential sales patterns from similar entities during non-peak periods, enriched by features learned from a graph neural networks (GNNs)-based forecasting model, to predict demand during peak events. We formulate the demand prediction as a meta-learning problem and develop the Feature-based First-Order Model-Agnostic Meta-Learning (F-FOMAML) algorithm that leverages proxy data from non-peak periods and GNN-generated relational metadata to learn feature-specific layer parameters, thereby adapting to demand forecasts for peak events. Theoretically, we show that by considering domain similarities through task-specific metadata, our model achieves improved generalization, where the excess risk decreases as the number of training tasks increases. Empirical evaluations on large-scale industrial datasets demonstrate the superiority of our approach. Compared to existing state-of-the-art models, our method demonstrates a notable improvement in demand prediction accuracy, reducing the Mean Absolute Error by 26.24% on an internal vending machine dataset and by 1.04% on the publicly accessible JD.com dataset.
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
Towards Biologically Plausible Computing: A Comprehensive Comparison
Authors:
Changze Lv,
Yufei Gu,
Zhengkang Guo,
Zhibo Xu,
Yixin Wu,
Feiran Zhang,
Tianyuan Shi,
Zhenghua Wang,
Ruicheng Yin,
Yu Shang,
Siqi Zhong,
Xiaohua Wang,
Muling Wu,
Wenhao Liu,
Tianlong Li,
Jianhao Zhu,
Cenyuan Zhang,
Zixuan Ling,
Xiaoqing Zheng
Abstract:
Backpropagation is a cornerstone algorithm in training neural networks for supervised learning, which uses a gradient descent method to update network weights by minimizing the discrepancy between actual and desired outputs. Despite its pivotal role in propelling deep learning advancements, the biological plausibility of backpropagation is questioned due to its requirements for weight symmetry, gl…
▽ More
Backpropagation is a cornerstone algorithm in training neural networks for supervised learning, which uses a gradient descent method to update network weights by minimizing the discrepancy between actual and desired outputs. Despite its pivotal role in propelling deep learning advancements, the biological plausibility of backpropagation is questioned due to its requirements for weight symmetry, global error computation, and dual-phase training. To address this long-standing challenge, many studies have endeavored to devise biologically plausible training algorithms. However, a fully biologically plausible algorithm for training multilayer neural networks remains elusive, and interpretations of biological plausibility vary among researchers. In this study, we establish criteria for biological plausibility that a desirable learning algorithm should meet. Using these criteria, we evaluate a range of existing algorithms considered to be biologically plausible, including Hebbian learning, spike-timing-dependent plasticity, feedback alignment, target propagation, predictive coding, forward-forward algorithm, perturbation learning, local losses, and energy-based learning. Additionally, we empirically evaluate these algorithms across diverse network architectures and datasets. We compare the feature representations learned by these algorithms with brain activity recorded by non-invasive devices under identical stimuli, aiming to identify which algorithm can most accurately replicate brain activity patterns. We are hopeful that this study could inspire the development of new biologically plausible algorithms for training multilayer networks, thereby fostering progress in both the fields of neuroscience and machine learning.
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models
Authors:
Xinrong Zhang,
Yingfa Chen,
Shengding Hu,
Xu Han,
Zihang Xu,
Yuanwei Xu,
Weilin Zhao,
Maosong Sun,
Zhiyuan Liu
Abstract:
As large language models (LLMs) increasingly permeate daily lives, there is a growing demand for real-time interactions that mirror human conversations. Traditional turn-based chat systems driven by LLMs prevent users from verbally interacting with the system while it is generating responses. To overcome these limitations, we adapt existing LLMs to \textit{duplex models} so that these LLMs can lis…
▽ More
As large language models (LLMs) increasingly permeate daily lives, there is a growing demand for real-time interactions that mirror human conversations. Traditional turn-based chat systems driven by LLMs prevent users from verbally interacting with the system while it is generating responses. To overcome these limitations, we adapt existing LLMs to \textit{duplex models} so that these LLMs can listen for users while generating output and dynamically adjust themselves to provide users with instant feedback. % such as in response to interruptions. Specifically, we divide the queries and responses of conversations into several time slices and then adopt a time-division-multiplexing (TDM) encoding-decoding strategy to pseudo-simultaneously process these slices. Furthermore, to make LLMs proficient enough to handle real-time conversations, we build a fine-tuning dataset consisting of alternating time slices of queries and responses as well as covering typical feedback types in instantaneous interactions. Our experiments show that although the queries and responses of conversations are segmented into incomplete slices for processing, LLMs can preserve their original performance on standard benchmarks with a few fine-tuning steps on our dataset. Automatic and human evaluation indicate that duplex models make user-AI interactions more natural and human-like, and greatly improve user satisfaction compared to vanilla LLMs. Our duplex model and dataset will be released.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Predicting fluorescent labels in label-free microscopy images with pix2pix and adaptive loss in Light My Cells challenge
Authors:
Han Liu,
Hao Li,
Jiacheng Wang,
Yubo Fan,
Zhoubing Xu,
Ipek Oguz
Abstract:
Fluorescence labeling is the standard approach to reveal cellular structures and other subcellular constituents for microscopy images. However, this invasive procedure may perturb or even kill the cells and the procedure itself is highly time-consuming and complex. Recently, in silico labeling has emerged as a promising alternative, aiming to use machine learning models to directly predict the flu…
▽ More
Fluorescence labeling is the standard approach to reveal cellular structures and other subcellular constituents for microscopy images. However, this invasive procedure may perturb or even kill the cells and the procedure itself is highly time-consuming and complex. Recently, in silico labeling has emerged as a promising alternative, aiming to use machine learning models to directly predict the fluorescently labeled images from label-free microscopy. In this paper, we propose a deep learning-based in silico labeling method for the Light My Cells challenge. Built upon pix2pix, our proposed method can be trained using the partially labeled datasets with an adaptive loss. Moreover, we explore the effectiveness of several training strategies to handle different input modalities, such as training them together or separately. The results show that our method achieves promising performance for in silico labeling. Our code is available at https://github.com/MedICL-VU/LightMyCells.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Development and Comprehensive Evaluation of TMR Sensor-Based Magnetrodes
Authors:
Jiahui Luo,
Zhaojie Xu,
Zhenhu **,
Mixia Wang,
Xinxia Cai,
Jiamin Chen
Abstract:
Due to their compact size and exceptional sensitivity at room temperature, magnetoresistance (MR) sensors have garnered considerable interest in numerous fields, particularly in the detection of weak magnetic signals in biological systems. The magnetrodes, integrating MR sensors with needle-shaped Si-based substrates, are designed to be inserted into the brain for local magnetic field detection. A…
▽ More
Due to their compact size and exceptional sensitivity at room temperature, magnetoresistance (MR) sensors have garnered considerable interest in numerous fields, particularly in the detection of weak magnetic signals in biological systems. The magnetrodes, integrating MR sensors with needle-shaped Si-based substrates, are designed to be inserted into the brain for local magnetic field detection. Although recent research has predominantly focused on giant magnetoresistance (GMR) sensors, tunnel magnetoresistance (TMR) sensors exhibit significantly higher sensitivity. In this study, we introduce TMR-based magnetrodes featuring TMR sensors at both the tip and mid-section of the probe, enabling detection of local magnetic fields at varied spatial positions. To enhance detectivity, we have designed and fabricated magnetrodes with varied aspect ratios of the free layer, incorporating diverse junction shapes, quantities, and serial arrangements. Utilizing a custom-built magnetotransport and noise measurement system for characterization, our TMR-based magnetrode demonstrates a limit of detection (LOD) of 300pT/Hz1/2 at 1 kHz. This implies that neuronal spikes can be distinguished with minimal averaging, thereby facilitating the elucidation of their magnetic properties.
△ Less
Submitted 14 May, 2024;
originally announced June 2024.
-
Sublattice Dichotomy in Monolayer FeSe Superconductor
Authors:
Cui Ding,
Zhipeng Xu,
Xiaotong Jiao,
Qiyin Hu,
Wenxuan Zhao,
Lexian Yang,
Kun Jiang,
**-Feng Jia,
Lili Wang,
Jiang** Hu,
Qi-Kun Xue
Abstract:
The pairing mechanism behind the monolayer FeSe is one essential question for iron-based superconductors. In this work, we show the sublattice degree of freedoms of monolayer FeSe plays a special role in its pairing properties, namely the sublattice dichotomy. The high-quality monolayer FeSe samples with atomic flat $1\times1$ topography on the SrTiO$_3$(001) substrates are grown by molecular beam…
▽ More
The pairing mechanism behind the monolayer FeSe is one essential question for iron-based superconductors. In this work, we show the sublattice degree of freedoms of monolayer FeSe plays a special role in its pairing properties, namely the sublattice dichotomy. The high-quality monolayer FeSe samples with atomic flat $1\times1$ topography on the SrTiO$_3$(001) substrates are grown by molecular beam epitaxy. By comparing the tunneling spectra at $α$ and $β$ Fe sublattices, we find the coherence peak of $α$-Fe at the inner gap $+V_i$ is higher than $β$-Fe while the coherence peak of $β$-Fe at $-V_i$ is higher than $α$-Fe with a similar amount. We also observed a reversed effect at the outer gap $\pm V_o$. We propose the $η$-pairing mechanism between $k$ and $-k+Q$ is the key mechanism for this unconventional sublattice dichotomy effect.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Search for the $e^+e^- \to φχ_{c1}(3872)$ process at BESIII
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (639 additional authors not shown)
Abstract:
Based on 368.5 pb$^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies 4.914 and 4.946 GeV by the BESIII detector, the $e^+e^- \to φχ_{c1}(3872)$ process is searched for the first time. No significant signal is observed and the upper limits at the 90\% confidence level on the product of the Born cross section $σ(e^+e^- \to φχ_{c1}(3872))$ and the branching fraction…
▽ More
Based on 368.5 pb$^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies 4.914 and 4.946 GeV by the BESIII detector, the $e^+e^- \to φχ_{c1}(3872)$ process is searched for the first time. No significant signal is observed and the upper limits at the 90\% confidence level on the product of the Born cross section $σ(e^+e^- \to φχ_{c1}(3872))$ and the branching fraction $\mathcal{B}[χ_{c1}(3872)\toπ^+π^- J/ψ]$ at 4.914 and 4.946 GeV are set to be 0.85 and 0.96 pb, respectively. These measurements provide useful information for the production of the $χ_{c1}(3872)$ at $e^+e^-$ collider and deepen our understanding about the nature of this particle.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Holistic Evaluation for Interleaved Text-and-Image Generation
Authors:
Minqian Liu,
Zhiyang Xu,
Zihao Lin,
Trevor Ashby,
Joy Rimchala,
Jiaxin Zhang,
Lifu Huang
Abstract:
Interleaved text-and-image generation has been an intriguing research direction, where the models are required to generate both images and text pieces in an arbitrary order. Despite the emerging advancements in interleaved generation, the progress in its evaluation still significantly lags behind. Existing evaluation benchmarks do not support arbitrarily interleaved images and text for both inputs…
▽ More
Interleaved text-and-image generation has been an intriguing research direction, where the models are required to generate both images and text pieces in an arbitrary order. Despite the emerging advancements in interleaved generation, the progress in its evaluation still significantly lags behind. Existing evaluation benchmarks do not support arbitrarily interleaved images and text for both inputs and outputs, and they only cover a limited number of domains and use cases. Also, current works predominantly use similarity-based metrics which fall short in assessing the quality in open-ended scenarios. To this end, we introduce InterleavedBench, the first benchmark carefully curated for the evaluation of interleaved text-and-image generation. InterleavedBench features a rich array of tasks to cover diverse real-world use cases. In addition, we present InterleavedEval, a strong reference-free metric powered by GPT-4o to deliver accurate and explainable evaluation. We carefully define five essential evaluation aspects for InterleavedEval, including text quality, perceptual quality, image coherence, text-image coherence, and helpfulness, to ensure a comprehensive and fine-grained assessment. Through extensive experiments and rigorous human evaluation, we show that our benchmark and metric can effectively evaluate the existing models with a strong correlation with human judgments surpassing previous reference-based metrics. We also provide substantial findings and insights to foster future research in interleaved generation and its evaluation.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
XENONnT WIMP Search: Signal & Background Modeling and Statistical Inference
Authors:
XENON Collaboration,
E. Aprile,
J. Aalbers,
K. Abe,
S. Ahmed Maouloud,
L. Althueser,
B. Andrieu,
E. Angelino,
D. Antón Martin,
F. Arneodo,
L. Baudis,
M. Bazyk,
L. Bellagamba,
R. Biondi,
A. Bismark,
K. Boese,
A. Brown,
G. Bruno,
R. Budnik,
J. M. R. Cardoso,
A. P. Cimental Chávez,
A. P. Colijn,
J. Conrad,
J. J. Cuenca-García,
V. D'Andrea
, et al. (139 additional authors not shown)
Abstract:
The XENONnT experiment searches for weakly-interacting massive particle (WIMP) dark matter scattering off a xenon nucleus. In particular, XENONnT uses a dual-phase time projection chamber with a 5.9-tonne liquid xenon target, detecting both scintillation and ionization signals to reconstruct the energy, position, and type of recoil. A blind search for nuclear recoil WIMPs with an exposure of 1.1 t…
▽ More
The XENONnT experiment searches for weakly-interacting massive particle (WIMP) dark matter scattering off a xenon nucleus. In particular, XENONnT uses a dual-phase time projection chamber with a 5.9-tonne liquid xenon target, detecting both scintillation and ionization signals to reconstruct the energy, position, and type of recoil. A blind search for nuclear recoil WIMPs with an exposure of 1.1 tonne-years yielded no signal excess over background expectations, from which competitive exclusion limits were derived on WIMP-nucleon elastic scatter cross sections, for WIMP masses ranging from 6 GeV/$c^2$ up to the TeV/$c^2$ scale. This work details the modeling and statistical methods employed in this search. By means of calibration data, we model the detector response, which is then used to derive background and signal models. The construction and validation of these models is discussed, alongside additional purely data-driven backgrounds. We also describe the statistical inference framework, including the definition of the likelihood function and the construction of confidence intervals.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
NTIRE 2024 Challenge on Night Photography Rendering
Authors:
Egor Ershov,
Artyom Panshin,
Oleg Karasev,
Sergey Korchagin,
Shepelev Lev,
Alexandr Startsev,
Daniil Vladimirov,
Ekaterina Zaychenkova,
Nikola Banić,
Dmitrii Iarchuk,
Maria Efimova,
Radu Timofte,
Arseniy Terekhin,
Shuwei Yue,
Yuyang Liu,
Minchen Wei,
Lu Xu,
Chao Zhang,
Yasi Wang,
Furkan Kınlı,
Doğa Yılmaz,
Barış Özcan,
Furkan Kıraç,
Shuai Liu,
**gyuan Xiao
, et al. (25 additional authors not shown)
Abstract:
This paper presents a review of the NTIRE 2024 challenge on night photography rendering. The goal of the challenge was to find solutions that process raw camera images taken in nighttime conditions, and thereby produce a photo-quality output images in the standard RGB (sRGB) space. Unlike the previous year's competition, the challenge images were collected with a mobile phone and the speed of algo…
▽ More
This paper presents a review of the NTIRE 2024 challenge on night photography rendering. The goal of the challenge was to find solutions that process raw camera images taken in nighttime conditions, and thereby produce a photo-quality output images in the standard RGB (sRGB) space. Unlike the previous year's competition, the challenge images were collected with a mobile phone and the speed of algorithms was also measured alongside the quality of their output. To evaluate the results, a sufficient number of viewers were asked to assess the visual quality of the proposed solutions, considering the subjective nature of the task. There were 2 nominations: quality and efficiency. Top 5 solutions in terms of output quality were sorted by evaluation time (see Fig. 1). The top ranking participants' solutions effectively represent the state-of-the-art in nighttime photography rendering. More results can be found at https://nightimaging.org.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates
Authors:
Fengqing Jiang,
Zhangchen Xu,
Luyao Niu,
Bill Yuchen Lin,
Radha Poovendran
Abstract:
Large language models (LLMs) are expected to follow instructions from users and engage in conversations. Techniques to enhance LLMs' instruction-following capabilities typically fine-tune them using data structured according to a predefined chat template. Although chat templates are shown to be effective in optimizing LLM performance, their impact on safety alignment of LLMs has been less understo…
▽ More
Large language models (LLMs) are expected to follow instructions from users and engage in conversations. Techniques to enhance LLMs' instruction-following capabilities typically fine-tune them using data structured according to a predefined chat template. Although chat templates are shown to be effective in optimizing LLM performance, their impact on safety alignment of LLMs has been less understood, which is crucial for deploying LLMs safely at scale.
In this paper, we investigate how chat templates affect safety alignment of LLMs. We identify a common vulnerability, named ChatBug, that is introduced by chat templates. Our key insight to identify ChatBug is that the chat templates provide a rigid format that need to be followed by LLMs, but not by users. Hence, a malicious user may not necessarily follow the chat template when prompting LLMs. Instead, malicious users could leverage their knowledge of the chat template and accordingly craft their prompts to bypass safety alignments of LLMs. We develop two attacks to exploit the ChatBug vulnerability. We demonstrate that a malicious user can exploit the ChatBug vulnerability of eight state-of-the-art (SOTA) LLMs and effectively elicit unintended responses from these models. Moreover, we show that ChatBug can be exploited by existing jailbreak attacks to enhance their attack success rates. We investigate potential countermeasures to ChatBug. Our results show that while adversarial training effectively mitigates the ChatBug vulnerability, the victim model incurs significant performance degradation. These results highlight the trade-off between safety alignment and helpfulness. Develo** new methods for instruction tuning to balance this trade-off is an open and critical direction for future research
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
CleanGen: Mitigating Backdoor Attacks for Generation Tasks in Large Language Models
Authors:
Yuetai Li,
Zhangchen Xu,
Fengqing Jiang,
Luyao Niu,
Dinuka Sahabandu,
Bhaskar Ramasubramanian,
Radha Poovendran
Abstract:
The remarkable performance of large language models (LLMs) in generation tasks has enabled practitioners to leverage publicly available models to power custom applications, such as chatbots and virtual assistants. However, the data used to train or fine-tune these LLMs is often undisclosed, allowing an attacker to compromise the data and inject backdoors into the models. In this paper, we develop…
▽ More
The remarkable performance of large language models (LLMs) in generation tasks has enabled practitioners to leverage publicly available models to power custom applications, such as chatbots and virtual assistants. However, the data used to train or fine-tune these LLMs is often undisclosed, allowing an attacker to compromise the data and inject backdoors into the models. In this paper, we develop a novel inference time defense, named CleanGen, to mitigate backdoor attacks for generation tasks in LLMs. CleanGenis a lightweight and effective decoding strategy that is compatible with the state-of-the-art (SOTA) LLMs. Our insight behind CleanGen is that compared to other LLMs, backdoored LLMs assign significantly higher probabilities to tokens representing the attacker-desired contents. These discrepancies in token probabilities enable CleanGen to identify suspicious tokens favored by the attacker and replace them with tokens generated by another LLM that is not compromised by the same attacker, thereby avoiding generation of attacker-desired content. We evaluate CleanGen against five SOTA backdoor attacks. Our results show that CleanGen achieves lower attack success rates (ASR) compared to five SOTA baseline defenses for all five backdoor attacks. Moreover, LLMs deploying CleanGen maintain helpfulness in their responses when serving benign user queries with minimal added computational overhead.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Unusual charge density wave introduced by Janus structure in monolayer vanadium dichalcogenides
Authors:
Ziqiang Xu,
Yan Shao,
Chun Huang,
Genyu Hu,
Shihao Hu,
Zhi-Lin Li,
Xiaoyu Hao,
Yanhui Hou,
Teng Zhang,
**-An Shi,
Chen Liu,
Jia-Ou Wang,
Wu Zhou,
Jiadong Zhou,
Wei Ji,
**gsi Qiao,
Xu Wu,
Hong-Jun Gao,
Yeliang Wang
Abstract:
As a fundamental structural feature, the symmetry of materials determines the exotic quantum properties in transition metal dichalcogenides (TMDs) with charge density wave (CDW). Breaking the inversion symmetry, the Janus structure, an artificially constructed lattice, provides an opportunity to tune the CDW states and the related properties. However, limited by the difficulties in atomic-level fa…
▽ More
As a fundamental structural feature, the symmetry of materials determines the exotic quantum properties in transition metal dichalcogenides (TMDs) with charge density wave (CDW). Breaking the inversion symmetry, the Janus structure, an artificially constructed lattice, provides an opportunity to tune the CDW states and the related properties. However, limited by the difficulties in atomic-level fabrication and material stability, the experimental visualization of the CDW states in 2D TMDs with Janus structure is still rare. Here, using surface selenization of VTe2, we fabricated monolayer Janus VTeSe. With scanning tunneling microscopy, an unusual root13-root13 CDW state with threefold rotational symmetry breaking was observed and characterized. Combined with theoretical calculations, we find this CDW state can be attributed to the charge modulation in the Janus VTeSe, beyond the conventional electron-phonon coupling. Our findings provide a promising platform for studying the CDW states and artificially tuning the electronic properties toward the applications.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Precision measurement of the $Ξ^-_b$ baryon lifetime
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1064 additional authors not shown)
Abstract:
A sample of $pp$ collision data, corresponding to an integrated luminosity of 5.5 fb$^{-1}$ and collected by the LHCb experiment during Run 2, is used to measure the ratio of the lifetime of the $Ξ^-_b$ baryon to that of the $Λ^0_b$ baryon, $r_τ\equivτ_{Ξ^-_b}/τ_{Λ^0_b}$. The value ${r_τ^{\rm Run\,2}=1.076\pm0.013\pm0.006}$ is obtained, where the first uncertainty is statistical and the second sys…
▽ More
A sample of $pp$ collision data, corresponding to an integrated luminosity of 5.5 fb$^{-1}$ and collected by the LHCb experiment during Run 2, is used to measure the ratio of the lifetime of the $Ξ^-_b$ baryon to that of the $Λ^0_b$ baryon, $r_τ\equivτ_{Ξ^-_b}/τ_{Λ^0_b}$. The value ${r_τ^{\rm Run\,2}=1.076\pm0.013\pm0.006}$ is obtained, where the first uncertainty is statistical and the second systematic. This value is averaged with the corresponding value from Run 1 to obtain ${r_τ^{\rm Run\,1,2} = 1.078\pm0.012\pm0.007}$. Multiplying by the world-average value of the $Λ^0_b$ lifetime yields $τ_{Ξ^-_b}^{\rm Run~1,2} = 1.578\pm0.018\pm0.010\pm0.011$ ps, where the uncertainties are statistical, systematic, and due to the limited knowledge of the $Λ^0_b$ lifetime. This measurement improves the precision of the current world average of the $Ξ^-_b$ lifetime by about a factor of two, and is in good agreement with the most recent theoretical predictions.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
FinTruthQA: A Benchmark Dataset for Evaluating the Quality of Financial Information Disclosure
Authors:
Ziyue Xu,
Peilin Zhou,
Xinyu Shi,
Jiageng Wu,
Yikang Jiang,
Bin Ke,
Jie Yang
Abstract:
Accurate and transparent financial information disclosure is crucial in the fields of accounting and finance, ensuring market efficiency and investor confidence. Among many information disclosure platforms, the Chinese stock exchanges' investor interactive platform provides a novel and interactive way for listed firms to disclose information of interest to investors through an online question-and-…
▽ More
Accurate and transparent financial information disclosure is crucial in the fields of accounting and finance, ensuring market efficiency and investor confidence. Among many information disclosure platforms, the Chinese stock exchanges' investor interactive platform provides a novel and interactive way for listed firms to disclose information of interest to investors through an online question-and-answer (Q&A) format. However, it is common for listed firms to respond to questions with limited or no substantive information, and automatically evaluating the quality of financial information disclosure on large amounts of Q&A pairs is challenging. This paper builds a benchmark FinTruthQA, that can evaluate advanced natural language processing (NLP) techniques for the automatic quality assessment of information disclosure in financial Q&A data. FinTruthQA comprises 6,000 real-world financial Q&A entries and each Q&A was manually annotated based on four conceptual dimensions of accounting. We benchmarked various NLP techniques on FinTruthQA, including statistical machine learning models, pre-trained language model and their fine-tuned versions, as well as the large language model GPT-4. Experiments showed that existing NLP models have strong predictive ability for real question identification and question relevance tasks, but are suboptimal for answer relevance and answer readability tasks. By establishing this benchmark, we provide a robust foundation for the automatic evaluation of information disclosure, significantly enhancing the transparency and quality of financial reporting. FinTruthQA can be used by auditors, regulators, and financial analysts for real-time monitoring and data-driven decision-making, as well as by researchers for advanced studies in accounting and finance, ultimately fostering greater trust and efficiency in the financial markets.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results
Authors:
Jiaqi Wang,
Yuhang Zang,
Pan Zhang,
Tao Chu,
Yuhang Cao,
Zeyi Sun,
Ziyu Liu,
Xiaoyi Dong,
Tong Wu,
Dahua Lin,
Zeming Chen,
Zhi Wang,
Lingchen Meng,
Wenhao Yao,
Jianwei Yang,
Sihong Wu,
Zhineng Chen,
Zuxuan Wu,
Yu-Gang Jiang,
Peixi Wu,
Bosong Chai,
Xuan Nie,
Longquan Yan,
Zeyu Wang,
Qifan Zhou
, et al. (9 additional authors not shown)
Abstract:
Detecting objects in real-world scenes is a complex task due to various challenges, including the vast range of object categories, and potential encounters with previously unknown or unseen objects. The challenges necessitate the development of public benchmarks and challenges to advance the field of object detection. Inspired by the success of previous COCO and LVIS Challenges, we organize the V3…
▽ More
Detecting objects in real-world scenes is a complex task due to various challenges, including the vast range of object categories, and potential encounters with previously unknown or unseen objects. The challenges necessitate the development of public benchmarks and challenges to advance the field of object detection. Inspired by the success of previous COCO and LVIS Challenges, we organize the V3Det Challenge 2024 in conjunction with the 4th Open World Vision Workshop: Visual Perception via Learning in an Open World (VPLOW) at CVPR 2024, Seattle, US. This challenge aims to push the boundaries of object detection research and encourage innovation in this field. The V3Det Challenge 2024 consists of two tracks: 1) Vast Vocabulary Object Detection: This track focuses on detecting objects from a large set of 13204 categories, testing the detection algorithm's ability to recognize and locate diverse objects. 2) Open Vocabulary Object Detection: This track goes a step further, requiring algorithms to detect objects from an open set of categories, including unknown objects. In the following sections, we will provide a comprehensive summary and analysis of the solutions submitted by participants. By analyzing the methods and solutions presented, we aim to inspire future research directions in vast vocabulary and open-vocabulary object detection, driving progress in this field. Challenge homepage: https://v3det.openxlab.org.cn/challenge
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Extracting $α_\mathrm{S}$ at future $e^+e^{-}$ Higgs factory with energy correlators
Authors:
Zhen Lin,
Manqi Ruan,
Meng Xiao,
Zhen Xu
Abstract:
The prospected sensitivity in $α_\mathrm{S}$ determination using an event shape observable, ratio of energy correlators at future electron-positron collider is presented. The study focuses on the collinear region which has suffered from large theoretical and hadronization uncertainty in the past. The ratio effectively reduces the impacts of the uncertainties. With the amount of data that future el…
▽ More
The prospected sensitivity in $α_\mathrm{S}$ determination using an event shape observable, ratio of energy correlators at future electron-positron collider is presented. The study focuses on the collinear region which has suffered from large theoretical and hadronization uncertainty in the past. The ratio effectively reduces the impacts of the uncertainties. With the amount of data that future electron-positron collider could produce in 1 minute (40 $\text{pb}^{-1}$) and 0.5 hour (1 $\text{fb}^{-1}$), a 1% and 0.2% precision of $α_\mathrm{S}$ could be reached.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
Make Your Home Safe: Time-aware Unsupervised User Behavior Anomaly Detection in Smart Homes via Loss-guided Mask
Authors:
**gyu Xiao,
Zhiyao Xu,
Qingsong Zou,
Qing Li,
Dan Zhao,
Dong Fang,
Ruoyu Li,
Wenxin Tang,
Kang Li,
Xudong Zuo,
Penghui Hu,
Yong Jiang,
Zixuan Weng,
Michael R. Lyv
Abstract:
Smart homes, powered by the Internet of Things, offer great convenience but also pose security concerns due to abnormal behaviors, such as improper operations of users and potential attacks from malicious attackers. Several behavior modeling methods have been proposed to identify abnormal behaviors and mitigate potential risks. However, their performance often falls short because they do not effec…
▽ More
Smart homes, powered by the Internet of Things, offer great convenience but also pose security concerns due to abnormal behaviors, such as improper operations of users and potential attacks from malicious attackers. Several behavior modeling methods have been proposed to identify abnormal behaviors and mitigate potential risks. However, their performance often falls short because they do not effectively learn less frequent behaviors, consider temporal context, or account for the impact of noise in human behaviors. In this paper, we propose SmartGuard, an autoencoder-based unsupervised user behavior anomaly detection framework. First, we design a Loss-guided Dynamic Mask Strategy (LDMS) to encourage the model to learn less frequent behaviors, which are often overlooked during learning. Second, we propose a Three-level Time-aware Position Embedding (TTPE) to incorporate temporal information into positional embedding to detect temporal context anomaly. Third, we propose a Noise-aware Weighted Reconstruction Loss (NWRL) that assigns different weights for routine behaviors and noise behaviors to mitigate the interference of noise behaviors during inference. Comprehensive experiments on three datasets with ten types of anomaly behaviors demonstrates that SmartGuard consistently outperforms state-of-the-art baselines and also offers highly interpretable results.
△ Less
Submitted 18 June, 2024; v1 submitted 16 June, 2024;
originally announced June 2024.