-
MIRAI: Evaluating LLM Agents for Event Forecasting
Authors:
Chenchen Ye,
Ziniu Hu,
Yihe Deng,
Zijie Huang,
Mingyu Derek Ma,
Yanqiao Zhu,
Wei Wang
Abstract:
Recent advancements in Large Language Models (LLMs) have empowered LLM agents to autonomously collect world information, over which to conduct reasoning to solve complex problems. Given this capability, increasing interests have been put into employing LLM agents for predicting international events, which can influence decision-making and shape policy development on an international scale. Despite…
▽ More
Recent advancements in Large Language Models (LLMs) have empowered LLM agents to autonomously collect world information, over which to conduct reasoning to solve complex problems. Given this capability, increasing interests have been put into employing LLM agents for predicting international events, which can influence decision-making and shape policy development on an international scale. Despite such a growing interest, there is a lack of a rigorous benchmark of LLM agents' forecasting capability and reliability. To address this gap, we introduce MIRAI, a novel benchmark designed to systematically evaluate LLM agents as temporal forecasters in the context of international events. Our benchmark features an agentic environment with tools for accessing an extensive database of historical, structured events and textual news articles. We refine the GDELT event database with careful cleaning and parsing to curate a series of relational prediction tasks with varying forecasting horizons, assessing LLM agents' abilities from short-term to long-term forecasting. We further implement APIs to enable LLM agents to utilize different tools via a code-based interface. In summary, MIRAI comprehensively evaluates the agents' capabilities in three dimensions: 1) autonomously source and integrate critical information from large global databases; 2) write codes using domain-specific APIs and libraries for tool-use; and 3) jointly reason over historical knowledge from diverse formats and time to accurately predict future events. Through comprehensive benchmarking, we aim to establish a reliable framework for assessing the capabilities of LLM agents in forecasting international events, thereby contributing to the development of more accurate and trustworthy models for international relation analysis.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Investigation of the actuation system of the inertial sensor for high-precision space missions using torsion pendulum
Authors:
Fangchao Yang,
Yan Zhu,
Xiaofei **,
Yujie Zhao,
Shixun Pei,
Wei Hong
Abstract:
Precision space inertial sensors are imperative to Earth geodesy missions, gravitational wave observations and several fundamental physics experiments in space. In these missions, the residual acceleration noise of the test mass(TM) caused by the forces from inertial sensor components and environment is supposed to be kept below a certain level. As a number of forces contributing to residual accel…
▽ More
Precision space inertial sensors are imperative to Earth geodesy missions, gravitational wave observations and several fundamental physics experiments in space. In these missions, the residual acceleration noise of the test mass(TM) caused by the forces from inertial sensor components and environment is supposed to be kept below a certain level. As a number of forces contributing to residual acceleration are related to actuation system, develo** a precise actuation system to exclude any erroneous force and obtain an ultra sensitive value for TM acceleration noise is necessary and essential. However, it is difficult to test the actuation system on ground. In this paper, a torsion pendulum is established to test the influence of actuation system on TM torque noise and a closed-loop control system combined torsion pendulum and parts of actuation modules is designed to assess the performance of actuation control algorithm. The experimental results show that the parameters in an actuation system will introduce additional torque noise and the maximum noise can reach as much as 10^{-13}Nm /Hz^{1/2} at 1 mHz. The stable tracking error for the closed-loop system is about 10^{-7}, indicating that the combination system achieves good tracking performance and robustness for TM rotation control in different conditions of inertial sensors.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Unified Dual-Intent Translation for Joint Modeling of Search and Recommendation
Authors:
Yuting Zhang,
Yiqing Wu,
Ruidong Han,
Ying Sun,
Yongchun Zhu,
Xiang Li,
Wei Lin,
Fuzhen Zhuang,
Zhulin An,
Yongjun Xu
Abstract:
Recommendation systems, which assist users in discovering their preferred items among numerous options, have served billions of users across various online platforms. Intuitively, users' interactions with items are highly driven by their unchanging inherent intents (e.g., always preferring high-quality items) and changing demand intents (e.g., wanting a T-shirt in summer but a down jacket in winte…
▽ More
Recommendation systems, which assist users in discovering their preferred items among numerous options, have served billions of users across various online platforms. Intuitively, users' interactions with items are highly driven by their unchanging inherent intents (e.g., always preferring high-quality items) and changing demand intents (e.g., wanting a T-shirt in summer but a down jacket in winter). However, both types of intents are implicitly expressed in recommendation scenario, posing challenges in leveraging them for accurate intent-aware recommendations. Fortunately, in search scenario, often found alongside recommendation on the same online platform, users express their demand intents explicitly through their query words. Intuitively, in both scenarios, a user shares the same inherent intent and the interactions may be influenced by the same demand intent. It is therefore feasible to utilize the interaction data from both scenarios to reinforce the dual intents for joint intent-aware modeling. But the joint modeling should deal with two problems: 1) accurately modeling users' implicit demand intents in recommendation; 2) modeling the relation between the dual intents and the interactive items. To address these problems, we propose a novel model named Unified Dual-Intents Translation for joint modeling of Search and Recommendation (UDITSR). To accurately simulate users' demand intents in recommendation, we utilize real queries from search data as supervision information to guide its generation. To explicitly model the relation among the triplet <inherent intent, demand intent, interactive item>, we propose a dual-intent translation propagation mechanism to learn the triplet in the same semantic space via embedding translations. Extensive experiments demonstrate that UDITSR outperforms SOTA baselines both in search and recommendation tasks.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
Physical Layer Deception with Non-Orthogonal Multiplexing
Authors:
Wenwen Chen,
Bin Han,
Yao Zhu,
Anke Schmeink,
Giuseppe Caire,
Hans D. Schotten
Abstract:
Physical layer security (PLS) is a promising technology to secure wireless communications by exploiting the physical properties of the wireless channel. However, the passive nature of PLS creates a significant imbalance between the effort required by eavesdroppers and legitimate users to secure data. To address this imbalance, in this article, we propose a novel framework of physical layer decepti…
▽ More
Physical layer security (PLS) is a promising technology to secure wireless communications by exploiting the physical properties of the wireless channel. However, the passive nature of PLS creates a significant imbalance between the effort required by eavesdroppers and legitimate users to secure data. To address this imbalance, in this article, we propose a novel framework of physical layer deception (PLD), which combines PLS with deception technologies to actively counteract wiretap** attempts. Combining a two-stage encoder with randomized ciphering and non-orthogonal multiplexing, the PLD approach enables the wireless communication system to proactively counter eavesdroppers with deceptive messages. Relying solely on the superiority of the legitimate channel over the eavesdrop** channel, the PLD framework can effectively protect the confidentiality of the transmitted messages, even against eavesdroppers who possess knowledge equivalent to that of the legitimate receiver. We prove the validity of the PLD framework with in-depth analyses and demonstrate its superiority over conventional PLS approaches with comprehensive numerical benchmarks.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
RTGS: Enabling Real-Time Gaussian Splatting on Mobile Devices Using Efficiency-Guided Pruning and Foveated Rendering
Authors:
Weikai Lin,
Yu Feng,
Yuhao Zhu
Abstract:
Point-Based Neural Rendering (PBNR), i.e., the 3D Gaussian Splatting-family algorithms, emerges as a promising class of rendering techniques, which are permeating all aspects of society, driven by a growing demand for real-time, photorealistic rendering in AR/VR and digital twins. Achieving real-time PBNR on mobile devices is challenging.
This paper proposes RTGS, a PBNR system that for the firs…
▽ More
Point-Based Neural Rendering (PBNR), i.e., the 3D Gaussian Splatting-family algorithms, emerges as a promising class of rendering techniques, which are permeating all aspects of society, driven by a growing demand for real-time, photorealistic rendering in AR/VR and digital twins. Achieving real-time PBNR on mobile devices is challenging.
This paper proposes RTGS, a PBNR system that for the first time delivers real-time neural rendering on mobile devices while maintaining human visual quality. RTGS combines two techniques. First, we present an efficiency-aware pruning technique to optimize rendering speed. Second, we introduce a Foveated Rendering (FR) method for PBNR, leveraging humans' low visual acuity in peripheral regions to relax rendering quality and improve rendering speed. Our system executes in real-time (above 100 FPS) on Nvidia Jetson Xavier board without sacrificing subjective visual quality, as confirmed by a user study. The code is open-sourced at [https://github.com/horizon-research/Fov-3DGS].
△ Less
Submitted 29 June, 2024;
originally announced July 2024.
-
JSCDS: A Core Data Selection Method with Jason-Shannon Divergence for Caries RGB Images-Efficient Learning
Authors:
Peiliang Zhang,
Yujia Tong,
Chenghu Du,
Chao Che,
Yongjun Zhu
Abstract:
Deep learning-based RGB caries detection improves the efficiency of caries identification and is crucial for preventing oral diseases. The performance of deep learning models depends on high-quality data and requires substantial training resources, making efficient deployment challenging. Core data selection, by eliminating low-quality and confusing data, aims to enhance training efficiency withou…
▽ More
Deep learning-based RGB caries detection improves the efficiency of caries identification and is crucial for preventing oral diseases. The performance of deep learning models depends on high-quality data and requires substantial training resources, making efficient deployment challenging. Core data selection, by eliminating low-quality and confusing data, aims to enhance training efficiency without significantly compromising model performance. However, distance-based data selection methods struggle to distinguish dependencies among high-dimensional caries data. To address this issue, we propose a Core Data Selection Method with Jensen-Shannon Divergence (JSCDS) for efficient caries image learning and caries classification. We describe the core data selection criterion as the distribution of samples in different classes. JSCDS calculates the cluster centers by sample embedding representation in the caries classification network and utilizes Jensen-Shannon Divergence to compute the mutual information between data samples and cluster centers, capturing nonlinear dependencies among high-dimensional data. The average mutual information is calculated to fit the above distribution, serving as the criterion for constructing the core set for model training. Extensive experiments on RGB caries datasets show that JSCDS outperforms other data selection methods in prediction performance and time consumption. Notably, JSCDS exceeds the performance of the full dataset model with only 50% of the core data, with its performance advantage becoming more pronounced in the 70% of core data.
△ Less
Submitted 29 June, 2024;
originally announced July 2024.
-
Observation of the Electromagnetic Dalitz Transition $h_c \rightarrow e^+e^-η_c$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
S. Ahmed,
M. Albrecht,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
X. H. Bai,
Y. Bai,
O. Bakina,
R. Baldini Ferroli,
I. Balossino,
Y. Ban,
K. Begzsuren,
N. Berger,
M. Bertani,
D. Bettoni,
F. Bianchi,
J. Bloms,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (495 additional authors not shown)
Abstract:
Using $(27.12\pm 0.14)\times10^8$ $ψ(3686)$ decays and data samples of $e^+e^-$ collisions with $\sqrt{s}$ from 4.130 to 4.780~GeV collected with the BESIII detector, we report the first observation of the electromagnetic Dalitz transition $h_c\to e^+e^-η_c$ with a statistical significance of $5.4σ$. We measure the ratio of the branching fractions…
▽ More
Using $(27.12\pm 0.14)\times10^8$ $ψ(3686)$ decays and data samples of $e^+e^-$ collisions with $\sqrt{s}$ from 4.130 to 4.780~GeV collected with the BESIII detector, we report the first observation of the electromagnetic Dalitz transition $h_c\to e^+e^-η_c$ with a statistical significance of $5.4σ$. We measure the ratio of the branching fractions $\frac{\mathcal{B}(h_c\rightarrow e^+e^-η_c)}{\mathcal{B}(h_c\rightarrow γη_c)}$ separately for the $h_c$ samples produced via $ψ(3686)\toπ^0h_c$ and $e^+e^-\toπ^+π^-h_c$. The average ratio is determined to be $(0.59\pm0.10(\text{stat.})\pm0.04(\text{syst.}))\%$, where the uncertainty includes both statistical and systematic components.
△ Less
Submitted 28 June, 2024;
originally announced July 2024.
-
A Personalised Learning Tool for Physics Undergraduate Students Built On a Large Language Model for Symbolic Regression
Authors:
Yufan Zhu,
Zi-Yu Khoo,
Jonathan Sze Choong Low,
Stephane Bressan
Abstract:
Interleaved practice enhances the memory and problem-solving ability of students in undergraduate courses. We introduce a personalized learning tool built on a Large Language Model (LLM) that can provide immediate and personalized attention to students as they complete homework containing problems interleaved from undergraduate physics courses. Our tool leverages the dimensional analysis method, e…
▽ More
Interleaved practice enhances the memory and problem-solving ability of students in undergraduate courses. We introduce a personalized learning tool built on a Large Language Model (LLM) that can provide immediate and personalized attention to students as they complete homework containing problems interleaved from undergraduate physics courses. Our tool leverages the dimensional analysis method, enhancing students' qualitative thinking and problem-solving skills for complex phenomena. Our approach combines LLMs for symbolic regression with dimensional analysis via prompt engineering and offers students a unique perspective to comprehend relationships between physics variables. This fosters a broader and more versatile understanding of physics and mathematical principles and complements a conventional undergraduate physics education that relies on interpreting and applying established equations within specific contexts. We test our personalized learning tool on the equations from Feynman's lectures on physics. Our tool can correctly identify relationships between physics variables for most equations, underscoring its value as a complementary personalized learning tool for undergraduate physics students.
△ Less
Submitted 17 June, 2024;
originally announced July 2024.
-
Module control of network analysis in psychopathology
Authors:
Chunyu Pan,
Quan Zhang,
Yue Zhu,
Shengzhou Kong,
Juan Liu,
Changsheng Zhang,
Fei Wang,
Xizhe Zhang
Abstract:
The network approach to characterizing psychopathology departs from traditional latent categorical and dimensional approaches. Causal interplay among symptoms contributed to dynamic psychopathology system. Therefore, analyzing the symptom clusters is critical for understanding mental disorders. Furthermore, despite extensive research studying the topological features of symptom networks, the contr…
▽ More
The network approach to characterizing psychopathology departs from traditional latent categorical and dimensional approaches. Causal interplay among symptoms contributed to dynamic psychopathology system. Therefore, analyzing the symptom clusters is critical for understanding mental disorders. Furthermore, despite extensive research studying the topological features of symptom networks, the control relationships between symptoms remain largely unclear. Here, we present a novel systematizing concept, module control, to analyze the control principle of the symptom network at a module level. We introduce Module Control Network (MCN) to identify key modules that regulate the network's behavior. By applying our approach to a multivariate psychological dataset, we discover that non-emotional modules, such as sleep-related and stress-related modules, are the primary controlling modules in the symptom network. Our findings indicate that module control can expose central symptom cluster governing psychopathology network, offering novel insights into the underlying mechanisms of mental disorders and individualized approach to psychological interventions.
△ Less
Submitted 30 May, 2024;
originally announced July 2024.
-
YuLan: An Open-source Large Language Model
Authors:
Yutao Zhu,
Kun Zhou,
Kelong Mao,
Wentong Chen,
Yiding Sun,
Zhipeng Chen,
Qian Cao,
Yihan Wu,
Yushuo Chen,
Feng Wang,
Lei Zhang,
Junyi Li,
Xiaolei Wang,
Lei Wang,
Beichen Zhang,
Zican Dong,
Xiaoxue Cheng,
Yuhan Chen,
Xinyu Tang,
Yupeng Hou,
Qiangqiang Ren,
Xincheng Pang,
Shufang Xie,
Wayne Xin Zhao,
Zhicheng Dou
, et al. (13 additional authors not shown)
Abstract:
Large language models (LLMs) have become the foundation of many applications, leveraging their extensive capabilities in processing and understanding natural language. While many open-source LLMs have been released with technical reports, the lack of training details hinders further research and development. This paper presents the development of YuLan, a series of open-source LLMs with $12$ billi…
▽ More
Large language models (LLMs) have become the foundation of many applications, leveraging their extensive capabilities in processing and understanding natural language. While many open-source LLMs have been released with technical reports, the lack of training details hinders further research and development. This paper presents the development of YuLan, a series of open-source LLMs with $12$ billion parameters. The base model of YuLan is pre-trained on approximately $1.7$T tokens derived from a diverse corpus, including massive English, Chinese, and multilingual texts. We design a three-stage pre-training method to enhance YuLan's overall capabilities. Subsequent phases of training incorporate instruction-tuning and human alignment, employing a substantial volume of high-quality synthesized data. To facilitate the learning of complex and long-tail knowledge, we devise a curriculum-learning framework throughout across these stages, which helps LLMs learn knowledge in an easy-to-hard manner. YuLan's training is finished on Jan, 2024 and has achieved performance on par with state-of-the-art LLMs across various English and Chinese benchmarks. This paper outlines a comprehensive technical roadmap for develo** LLMs from scratch. Our model and codes are available at https://github.com/RUC-GSAI/YuLan-Chat.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
MMRo: Are Multimodal LLMs Eligible as the Brain for In-Home Robotics?
Authors:
**ming Li,
Yichen Zhu,
Zhiyuan Xu,
**dong Gu,
Minjie Zhu,
Xin Liu,
Ning Liu,
Yaxin Peng,
Feifei Feng,
Jian Tang
Abstract:
It is fundamentally challenging for robots to serve as useful assistants in human environments because this requires addressing a spectrum of sub-problems across robotics, including perception, language understanding, reasoning, and planning. The recent advancements in Multimodal Large Language Models (MLLMs) have demonstrated their exceptional abilities in solving complex mathematical problems, m…
▽ More
It is fundamentally challenging for robots to serve as useful assistants in human environments because this requires addressing a spectrum of sub-problems across robotics, including perception, language understanding, reasoning, and planning. The recent advancements in Multimodal Large Language Models (MLLMs) have demonstrated their exceptional abilities in solving complex mathematical problems, mastering commonsense and abstract reasoning. This has led to the recent utilization of MLLMs as the brain in robotic systems, enabling these models to conduct high-level planning prior to triggering low-level control actions for task execution. However, it remains uncertain whether existing MLLMs are reliable in serving the brain role of robots. In this study, we introduce the first benchmark for evaluating Multimodal LLM for Robotic (MMRo) benchmark, which tests the capability of MLLMs for robot applications. Specifically, we identify four essential capabilities perception, task planning, visual reasoning, and safety measurement that MLLMs must possess to qualify as the robot's central processing unit. We have developed several scenarios for each capability, resulting in a total of 14 metrics for evaluation. We present experimental results for various MLLMs, including both commercial and open-source models, to assess the performance of existing systems. Our findings indicate that no single model excels in all areas, suggesting that current MLLMs are not yet trustworthy enough to serve as the cognitive core for robots. Our data can be found in https://mm-robobench.github.io/.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance
Authors:
Yuang Zhang,
Jiaxi Gu,
Li-Wen Wang,
Han Wang,
Junqi Cheng,
Yuefeng Zhu,
Fangyuan Zou
Abstract:
In recent years, generative artificial intelligence has achieved significant advancements in the field of image generation, spawning a variety of applications. However, video generation still faces considerable challenges in various aspects, such as controllability, video length, and richness of details, which hinder the application and popularization of this technology. In this work, we propose a…
▽ More
In recent years, generative artificial intelligence has achieved significant advancements in the field of image generation, spawning a variety of applications. However, video generation still faces considerable challenges in various aspects, such as controllability, video length, and richness of details, which hinder the application and popularization of this technology. In this work, we propose a controllable video generation framework, dubbed MimicMotion, which can generate high-quality videos of arbitrary length mimicking specific motion guidance. Compared with previous methods, our approach has several highlights. Firstly, we introduce confidence-aware pose guidance that ensures high frame quality and temporal smoothness. Secondly, we introduce regional loss amplification based on pose confidence, which significantly reduces image distortion. Lastly, for generating long and smooth videos, we propose a progressive latent fusion strategy. By this means, we can produce videos of arbitrary length with acceptable resource consumption. With extensive experiments and user studies, MimicMotion demonstrates significant improvements over previous approaches in various aspects. Detailed results and comparisons are available on our project page: https://tencent.github.io/MimicMotion .
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
Improved measurement of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (643 additional authors not shown)
Abstract:
Analyzing $e^+e^-$ collision data corresponding to an integrated luminosity of $7.33~\mathrm{fb}^{-1}$ collected at center-of-mass energies between 4.128 and 4.226~GeV with the BESIII detector, we measure the branching fraction of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$ to be $(2.98\pm0.23\pm0.12)\times10^{-3}$. The $D_s^+\to K^0$ hadronic form factor is determined from the differential dec…
▽ More
Analyzing $e^+e^-$ collision data corresponding to an integrated luminosity of $7.33~\mathrm{fb}^{-1}$ collected at center-of-mass energies between 4.128 and 4.226~GeV with the BESIII detector, we measure the branching fraction of the semileptonic decay $D^+_{s}\to K^0 e^+ν_e$ to be $(2.98\pm0.23\pm0.12)\times10^{-3}$. The $D_s^+\to K^0$ hadronic form factor is determined from the differential decay rate of $D^+_s\to K^0 e^+ν_e$ to be $f^{K^0}_+(0)=0.636\pm0.049\pm0.013$. For both measurements, the first uncertainty is statistical and the second systematic. The branching fraction and form factor measurements are factors of 1.6 and 1.7 more precise than the previous world averages, respectively.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
TrustUQA: A Trustful Framework for Unified Structured Data Question Answering
Authors:
Wen Zhang,
Long **,
Yushan Zhu,
Jiaoyan Chen,
Zhiwei Huang,
Junjie Wang,
Yin Hua,
Lei Liang,
Huajun Chen
Abstract:
Natural language question answering (QA) over structured data sources such as tables and knowledge graphs (KGs) have been widely investigated, for example with Large Language Models (LLMs). The main solutions include question to formal query parsing and retrieval-based answer generation. However, current methods of the former often suffer from weak generalization, failing to dealing with multiple…
▽ More
Natural language question answering (QA) over structured data sources such as tables and knowledge graphs (KGs) have been widely investigated, for example with Large Language Models (LLMs). The main solutions include question to formal query parsing and retrieval-based answer generation. However, current methods of the former often suffer from weak generalization, failing to dealing with multiple sources simultaneously, while the later is limited in trustfulness. In this paper, we propose UnifiedTQA, a trustful QA framework that can simultaneously support multiple types of structured data in a unified way. To this end, it adopts an LLM-friendly and unified knowledge representation method called Condition Graph (CG), and uses an LLM and demonstration-based two-level method for CG querying. For enhancement, it is also equipped with dynamic demonstration retrieval. We have evaluated UnifiedTQA with 5 benchmarks covering 3 types of structured data. It outperforms 2 existing unified structured data QA methods and in comparison with the baselines that are specific to a data type, it achieves state-of-the-art on 2 of them. Further more, we demonstrates potential of our method for more general QA tasks, QA over mixed structured data and QA across structured data.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
WavRx: a Disease-Agnostic, Generalizable, and Privacy-Preserving Speech Health Diagnostic Model
Authors:
Yi Zhu,
Tiago Falk
Abstract:
Speech is known to carry health-related attributes, which has emerged as a novel venue for remote and long-term health monitoring. However, existing models are usually tailored for a specific type of disease, and have been shown to lack generalizability across datasets. Furthermore, concerns have been raised recently towards the leakage of speaker identity from health embeddings. To mitigate these…
▽ More
Speech is known to carry health-related attributes, which has emerged as a novel venue for remote and long-term health monitoring. However, existing models are usually tailored for a specific type of disease, and have been shown to lack generalizability across datasets. Furthermore, concerns have been raised recently towards the leakage of speaker identity from health embeddings. To mitigate these limitations, we propose WavRx, a speech health diagnostics model that captures the respiration and articulation related dynamics from a universal speech representation. Our in-domain and cross-domain experiments on six pathological speech datasets demonstrate WavRx as a new state-of-the-art health diagnostic model. Furthermore, we show that the amount of speaker identity entailed in the WavRx health embeddings is significantly reduced without extra guidance during training. An in-depth analysis of the model was performed, thus providing physiological interpretation of its improved generalizability and privacy-preserving ability.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Understand What LLM Needs: Dual Preference Alignment for Retrieval-Augmented Generation
Authors:
Guanting Dong,
Yutao Zhu,
Chenghao Zhang,
Zechen Wang,
Zhicheng Dou,
Ji-Rong Wen
Abstract:
Retrieval-augmented generation (RAG) has demonstrated effectiveness in mitigating the hallucination problem of large language models (LLMs). However, the difficulty of aligning the retriever with the diverse LLMs' knowledge preferences inevitably poses an inevitable challenge in develo** a reliable RAG system. To address this issue, we propose DPA-RAG, a universal framework designed to align div…
▽ More
Retrieval-augmented generation (RAG) has demonstrated effectiveness in mitigating the hallucination problem of large language models (LLMs). However, the difficulty of aligning the retriever with the diverse LLMs' knowledge preferences inevitably poses an inevitable challenge in develo** a reliable RAG system. To address this issue, we propose DPA-RAG, a universal framework designed to align diverse knowledge preferences within RAG systems. Specifically, we initially introduce a preference knowledge construction pipline and incorporate five novel query augmentation strategies to alleviate preference data scarcity. Based on preference data, DPA-RAG accomplishes both external and internal preference alignment: 1) It jointly integrate pair-wise, point-wise, and contrastive preference alignment abilities into the reranker, achieving external preference alignment among RAG components. 2) It further introduces a pre-aligned stage before vanilla Supervised Fine-tuning (SFT), enabling LLMs to implicitly capture knowledge aligned with their reasoning preferences, achieving LLMs' internal alignment. Experimental results across four knowledge-intensive QA datasets demonstrate that DPA-RAG outperforms all baselines and seamlessly integrates both black-box and open-sourced LLM readers. Further qualitative analysis and discussions also provide empirical guidance for achieving reliable RAG systems. Our code is publicly available at https://github.com/dongguanting/DPA-RAG.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
GlobalTomo: A global dataset for physics-ML seismic wavefield modeling and FWI
Authors:
Shiqian Li,
Zhi Li,
Zhancun Mu,
Shiji Xin,
Zhixiang Dai,
Kuangdai Leng,
Ruihua Zhang,
Xiaodong Song,
Yixin Zhu
Abstract:
Global seismic tomography, taking advantage of seismic waves from natural earthquakes, provides essential insights into the earth's internal dynamics. Advanced Full-waveform Inversion (FWI) techniques, whose aim is to meticulously interpret every detail in seismograms, confront formidable computational demands in forward modeling and adjoint simulations on a global scale. Recent advancements in Ma…
▽ More
Global seismic tomography, taking advantage of seismic waves from natural earthquakes, provides essential insights into the earth's internal dynamics. Advanced Full-waveform Inversion (FWI) techniques, whose aim is to meticulously interpret every detail in seismograms, confront formidable computational demands in forward modeling and adjoint simulations on a global scale. Recent advancements in Machine Learning (ML) offer a transformative potential for accelerating the computational efficiency of FWI and extending its applicability to larger scales. This work presents the first 3D global synthetic dataset tailored for seismic wavefield modeling and full-waveform tomography, referred to as the GlobalTomo dataset. This dataset is uniquely comprehensive, incorporating explicit wave physics and robust geophysical parameterization at realistic global scales, generated through state-of-the-art forward simulations optimized for 3D global wavefield calculations. Through extensive analysis and the establishment of ML baselines, we illustrate that ML approaches are particularly suitable for global FWI, overcoming its limitations with rapid forward modeling and flexible inversion strategies. This work represents a cross-disciplinary effort to enhance our understanding of the earth's interior through physics-ML modeling.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Measurement of the cross sections of $e^+e^-\to K^{-}\barΞ^{+}Λ/Σ^{0}$ at center-of-mass energies between 3.510 and 4.914 GeV
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (638 additional authors not shown)
Abstract:
Using $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at center-of-mass energies between 3.510 and 4.914GeV, corresponding to an integrated luminosity of 25 fb$^{-1}$, we measure the Born cross sections for the process $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$ at thirty-five energy points with a partial-reconstruction strategy. By fitting the dressed cross sections of…
▽ More
Using $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at center-of-mass energies between 3.510 and 4.914GeV, corresponding to an integrated luminosity of 25 fb$^{-1}$, we measure the Born cross sections for the process $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$ at thirty-five energy points with a partial-reconstruction strategy. By fitting the dressed cross sections of $e^+e^-\to K^-\barΞ^+Λ/Σ^{0}$, evidence for $ψ(4160) \to K^{-}\barΞ^{+}Λ$ is found for the first time with a significance of 4.4$σ$, including systematic uncertainties. No evidence for other possible resonances is found. In addition, the products of electronic partial width and branching fraction for all assumed resonances decaying into $K^{-}\barΞ^{+}Λ/Σ^{0}$ are determined.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Artificial Immune System of Secure Face Recognition Against Adversarial Attacks
Authors:
Min Ren,
Yunlong Wang,
Yuhao Zhu,
Yongzhen Huang,
Zhenan Sun,
Qi Li,
Tieniu Tan
Abstract:
Insect production for food and feed presents a promising supplement to ensure food safety and address the adverse impacts of agriculture on climate and environment in the future. However, optimisation is required for insect production to realise its full potential. This can be by targeted improvement of traits of interest through selective breeding, an approach which has so far been underexplored…
▽ More
Insect production for food and feed presents a promising supplement to ensure food safety and address the adverse impacts of agriculture on climate and environment in the future. However, optimisation is required for insect production to realise its full potential. This can be by targeted improvement of traits of interest through selective breeding, an approach which has so far been underexplored and underutilised in insect farming. Here we present a comprehensive review of the selective breeding framework in the context of insect production. We systematically evaluate adjustments of selective breeding techniques to the realm of insects and highlight the essential components integral to the breeding process. The discussion covers every step of a conventional breeding scheme, such as formulation of breeding objectives, phenoty**, estimation of genetic parameters and breeding values, selection of appropriate breeding strategies, and mitigation of issues associated with genetic diversity depletion and inbreeding. This review combines knowledge from diverse disciplines, bridging the gap between animal breeding, quantitative genetics, evolutionary biology, and entomology, offering an integrated view of the insect breeding research area and uniting knowledge which has previously remained scattered across diverse fields of expertise.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Measurements of $K_S^0$-$K_L^0$ asymmetries in the decays $Λ_c^+ \to pK_{L,S}^0$, $pK_{L,S}^0π^+π^-$ and $pK_{L,S}^0π^0$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (643 additional authors not shown)
Abstract:
Using $e^+e^-$ annihilation data sets corresponding to an integrated luminosity of 4.5 $\text{fb}^{-1}$, collected with the BESIII detector at center-of-mass energies between 4.600 and 4.699 GeV, we report the first measurements of the absolute branching fractions $\mathcal{B}(Λ_c^+\to pK_{L}^{0})=(1.67 \pm 0.06 \pm 0. 04)\%$, $\mathcal{B}(Λ_c^+\to pK_{L}^{0}π^+π^-)=(1.69 \pm 0.10 \pm 0.05)\%$, an…
▽ More
Using $e^+e^-$ annihilation data sets corresponding to an integrated luminosity of 4.5 $\text{fb}^{-1}$, collected with the BESIII detector at center-of-mass energies between 4.600 and 4.699 GeV, we report the first measurements of the absolute branching fractions $\mathcal{B}(Λ_c^+\to pK_{L}^{0})=(1.67 \pm 0.06 \pm 0. 04)\%$, $\mathcal{B}(Λ_c^+\to pK_{L}^{0}π^+π^-)=(1.69 \pm 0.10 \pm 0.05)\%$, and $\mathcal{B}(Λ_c^+\to pK_{L}^{0}π^0)=(2.02 \pm 0.13 \pm 0.05)\%$, where the first uncertainties are statistical and the second systematic. Combining with the known branching fractions of $Λ_c^+ \to pK_{S}^{0}$, $Λ_c^+ \to pK_{S}^{0}π^+π^-$, and $Λ_c^+ \to pK_{S}^{0}π^0$, we present the first measurements of the $K_{S}^{0}$-$K_{L}^{0}$ asymmetries $R(Λ_c^+, K_{S,L}^0X) = \frac{\mathcal{B}(Λ_c^+ \to K_{S}^{0} X) - \mathcal{B}(Λ_c^+ \to K_{L}^{0} X)}{\mathcal{B}(Λ_c^+ \to K_{S}^{0} X) + \mathcal{B}(Λ_c^+ \to K_{L}^{0} X)}$ in charmed baryon decays: $R(Λ_c^+, pK_{S,L}^0) = -0.025 \pm 0.031$, $R(Λ_c^+, pK_{S,L}^0π^+π^-) = -0.027 \pm 0.048$, and $R(Λ_c^+, pK_{S,L}^0π^0) =-0.015 \pm 0.046$. No significant asymmetries within the uncertainties are observed.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Study of the $f_{0}(980)$ through the decay $D_{s}^{+}\rightarrow π^{+}π^{+}π^{-}π^{0}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (649 additional authors not shown)
Abstract:
We perform the first amplitude analysis of $D^+_s \to π^+π^+π^-π^0$ decays, based on data samples of electron-positron collisions recorded with the BESIII detector at center-of-mass energies between 4.128 and 4.226 GeV, corresponding to an integrated luminosity of 7.33~fb$^{-1}$. We report the observation of $D_{s}^{+} \to f_0(980)ρ(770)^{+}$ with a statistical significance greater than 10$σ$ and…
▽ More
We perform the first amplitude analysis of $D^+_s \to π^+π^+π^-π^0$ decays, based on data samples of electron-positron collisions recorded with the BESIII detector at center-of-mass energies between 4.128 and 4.226 GeV, corresponding to an integrated luminosity of 7.33~fb$^{-1}$. We report the observation of $D_{s}^{+} \to f_0(980)ρ(770)^{+}$ with a statistical significance greater than 10$σ$ and determine the branching fractions $\mathcal{B}(D_s^+\toπ^+π^+π^-π^0|_{{\rm non}-η})=(2.04\pm0.08_{\rm stat.}\pm0.05_{\rm syst.})\%$ and $\mathcal{B}(D_s^+\toηπ^+)=(1.56\pm0.09_{\rm stat.}\pm0.04_{\rm syst.})\%$. Moreover, we measure the relative branching fraction between $φ\toπ^+π^-π^0$ and $φ\to K^+K^-$ to be $\frac{\mathcal{B}(φ(1020) \to π^+π^-π^0)}{\mathcal{B}(φ(1020) \to K^+K^-)}=0.230 \pm 0.014_{\rm stat.} \pm 0.010_{\rm syst.}$, which deviates from the world average value by more than $4σ$.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
MindSpore Quantum: A User-Friendly, High-Performance, and AI-Compatible Quantum Computing Framework
Authors:
Xusheng Xu,
Jiangyu Cui,
Zidong Cui,
Runhong He,
Qingyu Li,
Xiaowei Li,
Yanling Lin,
Jiale Liu,
Wuxin Liu,
Jiale Lu,
Maolin Luo,
Chufan Lyu,
Shijie Pan,
Mosharev Pavel,
Runqiu Shu,
Jialiang Tang,
Ruoqian Xu,
Shu Xu,
Kang Yang,
Fan Yu,
Qingguo Zeng,
Haiying Zhao,
Qiang Zheng,
Junyuan Zhou,
Xu Zhou
, et al. (14 additional authors not shown)
Abstract:
We introduce MindSpore Quantum, a pioneering hybrid quantum-classical framework with a primary focus on the design and implementation of noisy intermediate-scale quantum (NISQ) algorithms. Leveraging the robust support of MindSpore, an advanced open-source deep learning training/inference framework, MindSpore Quantum exhibits exceptional efficiency in the design and training of variational quantum…
▽ More
We introduce MindSpore Quantum, a pioneering hybrid quantum-classical framework with a primary focus on the design and implementation of noisy intermediate-scale quantum (NISQ) algorithms. Leveraging the robust support of MindSpore, an advanced open-source deep learning training/inference framework, MindSpore Quantum exhibits exceptional efficiency in the design and training of variational quantum algorithms on both CPU and GPU platforms, delivering remarkable performance. Furthermore, this framework places a strong emphasis on enhancing the operational efficiency of quantum algorithms when executed on real quantum hardware. This encompasses the development of algorithms for quantum circuit compilation and qubit map**, crucial components for achieving optimal performance on quantum processors. In addition to the core framework, we introduce QuPack, a meticulously crafted quantum computing acceleration engine. QuPack significantly accelerates the simulation speed of MindSpore Quantum, particularly in variational quantum eigensolver (VQE), quantum approximate optimization algorithm (QAOA), and tensor network simulations, providing astonishing speed. This combination of cutting-edge technologies empowers researchers and practitioners to explore the frontiers of quantum computing with unprecedented efficiency and performance.
△ Less
Submitted 27 June, 2024; v1 submitted 24 June, 2024;
originally announced June 2024.
-
Efficient Evolutionary Search Over Chemical Space with Large Language Models
Authors:
Haorui Wang,
Marta Skreta,
Cher-Tian Ser,
Wenhao Gao,
Lingkai Kong,
Felix Streith-Kalthoff,
Chenru Duan,
Yuchen Zhuang,
Yue Yu,
Yanqiao Zhu,
Yuanqi Du,
Alán Aspuru-Guzik,
Kirill Neklyudov,
Chao Zhang
Abstract:
Molecular discovery, when formulated as an optimization problem, presents significant computational challenges because optimization objectives can be non-differentiable. Evolutionary Algorithms (EAs), often used to optimize black-box objectives in molecular discovery, traverse chemical space by performing random mutations and crossovers, leading to a large number of expensive objective evaluations…
▽ More
Molecular discovery, when formulated as an optimization problem, presents significant computational challenges because optimization objectives can be non-differentiable. Evolutionary Algorithms (EAs), often used to optimize black-box objectives in molecular discovery, traverse chemical space by performing random mutations and crossovers, leading to a large number of expensive objective evaluations. In this work, we ameliorate this shortcoming by incorporating chemistry-aware Large Language Models (LLMs) into EAs. Namely, we redesign crossover and mutation operations in EAs using LLMs trained on large corpora of chemical information. We perform extensive empirical studies on both commercial and open-source models on multiple tasks involving property optimization, molecular rediscovery, and structure-based drug design, demonstrating that the joint usage of LLMs with EAs yields superior performance over all baseline models across single- and multi-objective settings. We demonstrate that our algorithm improves both the quality of the final solution and convergence speed, thereby reducing the number of required objective evaluations. Our code is available at http://github.com/zoom-wang112358/MOLLEO
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
Enhancing Wearable based Real-Time Glucose Monitoring via Phasic Image Representation Learning based Deep Learning
Authors:
Yidong Zhu,
Nadia B Aimandi,
Mohammad Arif Ul Alam
Abstract:
In the U.S., over a third of adults are pre-diabetic, with 80\% unaware of their status. This underlines the need for better glucose monitoring to prevent type 2 diabetes and related heart diseases. Existing wearable glucose monitors are limited by the lack of models trained on small datasets, as collecting extensive glucose data is often costly and impractical. Our study introduces a novel machin…
▽ More
In the U.S., over a third of adults are pre-diabetic, with 80\% unaware of their status. This underlines the need for better glucose monitoring to prevent type 2 diabetes and related heart diseases. Existing wearable glucose monitors are limited by the lack of models trained on small datasets, as collecting extensive glucose data is often costly and impractical. Our study introduces a novel machine learning method using modified recurrence plots in the frequency domain to improve glucose level prediction accuracy from wearable device data, even with limited datasets. This technique combines advanced signal processing with machine learning to extract more meaningful features. We tested our method against existing models using historical data, showing that our approach surpasses the current 87\% accuracy benchmark in predicting real-time interstitial glucose levels.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
DemoRank: Selecting Effective Demonstrations for Large Language Models in Ranking Task
Authors:
Wenhan Liu,
Yutao Zhu,
Zhicheng Dou
Abstract:
Recently, there has been increasing interest in applying large language models (LLMs) as zero-shot passage rankers. However, few studies have explored how to select appropriate in-context demonstrations for the passage ranking task, which is the focus of this paper. Previous studies mainly apply a demonstration retriever to retrieve demonstrations and use top-$k$ demonstrations for in-context lear…
▽ More
Recently, there has been increasing interest in applying large language models (LLMs) as zero-shot passage rankers. However, few studies have explored how to select appropriate in-context demonstrations for the passage ranking task, which is the focus of this paper. Previous studies mainly apply a demonstration retriever to retrieve demonstrations and use top-$k$ demonstrations for in-context learning (ICL). Although effective, this approach overlooks the dependencies between demonstrations, leading to inferior performance of few-shot ICL in the passage ranking task. In this paper, we formulate the demonstration selection as a \textit{retrieve-then-rerank} process and introduce the DemoRank framework. In this framework, we first use LLM feedback to train a demonstration retriever and construct a novel dependency-aware training samples to train a demonstration reranker to improve few-shot ICL. The construction of such training samples not only considers demonstration dependencies but also performs in an efficient way. Extensive experiments demonstrate DemoRank's effectiveness in in-domain scenarios and strong generalization to out-of-domain scenarios. Our codes are available at~\url{https://github.com/8421BCD/DemoRank}.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Towards Real-Time Neural Volumetric Rendering on Mobile Devices: A Measurement Study
Authors:
Zhe Wang,
Yifei Zhu
Abstract:
Neural Radiance Fields (NeRF) is an emerging technique to synthesize 3D objects from 2D images with a wide range of potential applications. However, rendering existing NeRF models is extremely computation intensive, making it challenging to support real-time interaction on mobile devices. In this paper, we take the first initiative to examine the state-of-the-art real-time NeRF rendering technique…
▽ More
Neural Radiance Fields (NeRF) is an emerging technique to synthesize 3D objects from 2D images with a wide range of potential applications. However, rendering existing NeRF models is extremely computation intensive, making it challenging to support real-time interaction on mobile devices. In this paper, we take the first initiative to examine the state-of-the-art real-time NeRF rendering technique from a system perspective. We first define the entire working pipeline of the NeRF serving system. We then identify possible control knobs that are critical to the system from the communication, computation, and visual performance perspective. Furthermore, an extensive measurement study is conducted to reveal the effects of these control knobs on system performance. Our measurement results reveal that different control knobs contribute differently towards improving the system performance, with the mesh granularity being the most effective knob and the quantization being the least effective knob. In addition, diverse hardware device settings and network conditions have to be considered to fully unleash the benefit of operating under the appropriate knobs
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
Search for the $e^+e^- \to φχ_{c1}(3872)$ process at BESIII
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (639 additional authors not shown)
Abstract:
Based on 368.5 pb$^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies 4.914 and 4.946 GeV by the BESIII detector, the $e^+e^- \to φχ_{c1}(3872)$ process is searched for the first time. No significant signal is observed and the upper limits at the 90\% confidence level on the product of the Born cross section $σ(e^+e^- \to φχ_{c1}(3872))$ and the branching fraction…
▽ More
Based on 368.5 pb$^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies 4.914 and 4.946 GeV by the BESIII detector, the $e^+e^- \to φχ_{c1}(3872)$ process is searched for the first time. No significant signal is observed and the upper limits at the 90\% confidence level on the product of the Born cross section $σ(e^+e^- \to φχ_{c1}(3872))$ and the branching fraction $\mathcal{B}[χ_{c1}(3872)\toπ^+π^- J/ψ]$ at 4.914 and 4.946 GeV are set to be 0.85 and 0.96 pb, respectively. These measurements provide useful information for the production of the $χ_{c1}(3872)$ at $e^+e^-$ collider and deepen our understanding about the nature of this particle.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Enhancing the LLM-Based Robot Manipulation Through Human-Robot Collaboration
Authors:
Haokun Liu,
Yaonan Zhu,
Kenji Kato,
Atsushi Tsukahara,
Izumi Kondo,
Tadayoshi Aoyama,
Yasuhisa Hasegawa
Abstract:
Large Language Models (LLMs) are gaining popularity in the field of robotics. However, LLM-based robots are limited to simple, repetitive motions due to the poor integration between language models, robots, and the environment. This paper proposes a novel approach to enhance the performance of LLM-based autonomous manipulation through Human-Robot Collaboration (HRC). The approach involves using a…
▽ More
Large Language Models (LLMs) are gaining popularity in the field of robotics. However, LLM-based robots are limited to simple, repetitive motions due to the poor integration between language models, robots, and the environment. This paper proposes a novel approach to enhance the performance of LLM-based autonomous manipulation through Human-Robot Collaboration (HRC). The approach involves using a prompted GPT-4 language model to decompose high-level language commands into sequences of motions that can be executed by the robot. The system also employs a YOLO-based perception algorithm, providing visual cues to the LLM, which aids in planning feasible motions within the specific environment. Additionally, an HRC method is proposed by combining teleoperation and Dynamic Movement Primitives (DMP), allowing the LLM-based robot to learn from human guidance. Real-world experiments have been conducted using the Toyota Human Support Robot for manipulation tasks. The outcomes indicate that tasks requiring complex trajectory planning and reasoning over environments can be efficiently accomplished through the incorporation of human demonstrations.
△ Less
Submitted 1 July, 2024; v1 submitted 20 June, 2024;
originally announced June 2024.
-
Synthetic spin-orbit coupling for the multi-spin models in optical lattices
Authors:
Zhen Zheng,
Yan-Qing Zhu,
Shanchao Zhang,
Shi-Liang Zhu,
Z. D. Wang
Abstract:
The essential role of synthetic spin-orbit coupling in discovering new topological matter phases with cold atoms is widely acknowledged. However, the engineering of spin-orbit coupling remains unclear for arbitrary-spin models due to the complexity of spin matrices. In this work, we develop a more general but relatively straightforward method to achieve spin-orbit coupling for multi-spin models. O…
▽ More
The essential role of synthetic spin-orbit coupling in discovering new topological matter phases with cold atoms is widely acknowledged. However, the engineering of spin-orbit coupling remains unclear for arbitrary-spin models due to the complexity of spin matrices. In this work, we develop a more general but relatively straightforward method to achieve spin-orbit coupling for multi-spin models. Our approach hinges on controlling the coupling between distinct pseudo-spins through two intermediary states, resulting in tunneling with spin flips that have direction-dependent strength. The engineered spin-orbit coupling can facilitate topological phase transitions with Chern numbers over 1, a unique characteristic of multi-spin models compared to spin-1/2 models. By utilizing existing cold atom techniques, our proposed method provides an ideal platform for investigating topological properties related to large Chern numbers.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Causal Inference with Latent Variables: Recent Advances and Future Prospectives
Authors:
Yaochen Zhu,
Yinhan He,
**g Ma,
Mengxuan Hu,
Sheng Li,
Jundong Li
Abstract:
Causality lays the foundation for the trajectory of our world. Causal inference (CI), which aims to infer intrinsic causal relations among variables of interest, has emerged as a crucial research topic. Nevertheless, the lack of observation of important variables (e.g., confounders, mediators, exogenous variables, etc.) severely compromises the reliability of CI methods. The issue may arise from t…
▽ More
Causality lays the foundation for the trajectory of our world. Causal inference (CI), which aims to infer intrinsic causal relations among variables of interest, has emerged as a crucial research topic. Nevertheless, the lack of observation of important variables (e.g., confounders, mediators, exogenous variables, etc.) severely compromises the reliability of CI methods. The issue may arise from the inherent difficulty in measuring the variables. Additionally, in observational studies where variables are passively recorded, certain covariates might be inadvertently omitted by the experimenter. Depending on the type of unobserved variables and the specific CI task, various consequences can be incurred if these latent variables are carelessly handled, such as biased estimation of causal effects, incomplete understanding of causal mechanisms, lack of individual-level causal consideration, etc. In this survey, we provide a comprehensive review of recent developments in CI with latent variables. We start by discussing traditional CI techniques when variables of interest are assumed to be fully observed. Afterward, under the taxonomy of circumvention and inference-based methods, we provide an in-depth discussion of various CI strategies to handle latent variables, covering the tasks of causal effect estimation, mediation analysis, counterfactual reasoning, and causal discovery. Furthermore, we generalize the discussion to graph data where interference among units may exist. Finally, we offer fresh aspects for further advancement of CI with latent variables, especially new opportunities in the era of large language models (LLMs).
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Knowledge Graph-Enhanced Large Language Models via Path Selection
Authors:
Haochen Liu,
Song Wang,
Yaochen Zhu,
Yushun Dong,
Jundong Li
Abstract:
Large Language Models (LLMs) have shown unprecedented performance in various real-world applications. However, they are known to generate factually inaccurate outputs, a.k.a. the hallucination problem. In recent years, incorporating external knowledge extracted from Knowledge Graphs (KGs) has become a promising strategy to improve the factual accuracy of LLM-generated outputs. Nevertheless, most e…
▽ More
Large Language Models (LLMs) have shown unprecedented performance in various real-world applications. However, they are known to generate factually inaccurate outputs, a.k.a. the hallucination problem. In recent years, incorporating external knowledge extracted from Knowledge Graphs (KGs) has become a promising strategy to improve the factual accuracy of LLM-generated outputs. Nevertheless, most existing explorations rely on LLMs themselves to perform KG knowledge extraction, which is highly inflexible as LLMs can only provide binary judgment on whether a certain knowledge (e.g., a knowledge path in KG) should be used. In addition, LLMs tend to pick only knowledge with direct semantic relationship with the input text, while potentially useful knowledge with indirect semantics can be ignored. In this work, we propose a principled framework KELP with three stages to handle the above problems. Specifically, KELP is able to achieve finer granularity of flexible knowledge extraction by generating scores for knowledge paths with input texts via latent semantic matching. Meanwhile, knowledge paths with indirect semantic relationships with the input text can also be considered via trained encoding between the selected paths in KG and the input text. Experiments on real-world datasets validate the effectiveness of KELP.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Root Cause Localization for Microservice Systems in Cloud-edge Collaborative Environments
Authors:
Yuhan Zhu,
Jian Wang,
Bing Li,
Xuxian Tang,
Hao Li,
Neng Zhang,
Yuqi Zhao
Abstract:
With the development of cloud-native technologies, microservice-based software systems face challenges in accurately localizing root causes when failures occur. Additionally, the cloud-edge collaborative environment introduces more difficulties, such as unstable networks and high latency across network segments. Accurately identifying the root cause of microservices in a cloud-edge collaborative e…
▽ More
With the development of cloud-native technologies, microservice-based software systems face challenges in accurately localizing root causes when failures occur. Additionally, the cloud-edge collaborative environment introduces more difficulties, such as unstable networks and high latency across network segments. Accurately identifying the root cause of microservices in a cloud-edge collaborative environment has thus become an urgent problem. In this paper, we propose MicroCERCL, a novel approach that pinpoints root causes at the kernel and application level in the cloud-edge collaborative environment. Our key insight is that failures propagate through direct invocations and indirect resource-competition dependencies in a cloud-edge collaborative environment characterized by instability and high latency. This will become more complex in the hybrid deployment that simultaneously involves multiple microservice systems. Leveraging this insight, we extract valid contents from kernel-level logs to prioritize localizing the kernel-level root cause. Moreover, we construct a heterogeneous dynamic topology stack and train a graph neural network model to accurately localize the application-level root cause without relying on historical data. Notably, we released the first benchmark hybrid deployment microservice system in a cloud-edge collaborative environment (the largest and most complex within our knowledge). Experiments conducted on the dataset collected from the benchmark show that MicroCERCL can accurately localize the root cause of microservice systems in such environments, significantly outperforming state-of-the-art approaches with an increase of at least 24.1% in top-1 accuracy.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
One Fits All: Learning Fair Graph Neural Networks for Various Sensitive Attributes
Authors:
Yuchang Zhu,
**tang Li,
Yatao Bian,
Zibin Zheng,
Liang Chen
Abstract:
Recent studies have highlighted fairness issues in Graph Neural Networks (GNNs), where they produce discriminatory predictions against specific protected groups categorized by sensitive attributes such as race and age. While various efforts to enhance GNN fairness have made significant progress, these approaches are often tailored to specific sensitive attributes. Consequently, they necessitate re…
▽ More
Recent studies have highlighted fairness issues in Graph Neural Networks (GNNs), where they produce discriminatory predictions against specific protected groups categorized by sensitive attributes such as race and age. While various efforts to enhance GNN fairness have made significant progress, these approaches are often tailored to specific sensitive attributes. Consequently, they necessitate retraining the model from scratch to accommodate changes in the sensitive attribute requirement, resulting in high computational costs. To gain deeper insights into this issue, we approach the graph fairness problem from a causal modeling perspective, where we identify the confounding effect induced by the sensitive attribute as the underlying reason. Motivated by this observation, we formulate the fairness problem in graphs from an invariant learning perspective, which aims to learn invariant representations across environments. Accordingly, we propose a graph fairness framework based on invariant learning, namely FairINV, which enables the training of fair GNNs to accommodate various sensitive attributes within a single training session. Specifically, FairINV incorporates sensitive attribute partition and trains fair GNNs by eliminating spurious correlations between the label and various sensitive attributes. Experimental results on several real-world datasets demonstrate that FairINV significantly outperforms state-of-the-art fairness approaches, underscoring its effectiveness. Our code is available via: https://github.com/ZzoomD/FairINV/.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
ARDuP: Active Region Video Diffusion for Universal Policies
Authors:
Shuaiyi Huang,
Mara Levy,
Zhenyu Jiang,
Anima Anandkumar,
Yuke Zhu,
Linxi Fan,
De-An Huang,
Abhinav Shrivastava
Abstract:
Sequential decision-making can be formulated as a text-conditioned video generation problem, where a video planner, guided by a text-defined goal, generates future frames visualizing planned actions, from which control actions are subsequently derived. In this work, we introduce Active Region Video Diffusion for Universal Policies (ARDuP), a novel framework for video-based policy learning that emp…
▽ More
Sequential decision-making can be formulated as a text-conditioned video generation problem, where a video planner, guided by a text-defined goal, generates future frames visualizing planned actions, from which control actions are subsequently derived. In this work, we introduce Active Region Video Diffusion for Universal Policies (ARDuP), a novel framework for video-based policy learning that emphasizes the generation of active regions, i.e. potential interaction areas, enhancing the conditional policy's focus on interactive areas critical for task execution. This innovative framework integrates active region conditioning with latent diffusion models for video planning and employs latent representations for direct action decoding during inverse dynamic modeling. By utilizing motion cues in videos for automatic active region discovery, our method eliminates the need for manual annotations of active regions. We validate ARDuP's efficacy via extensive experiments on simulator CLIPort and the real-world dataset BridgeData v2, achieving notable improvements in success rates and generating convincingly realistic video plans.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Dynamical phase-field model of cavity electromagnonic systems
Authors:
Shihao Zhuang,
Yujie Zhu,
Changchun Zhong,
Liang Jiang,
Xufeng Zhang,
Jia-Mian Hu
Abstract:
Cavity electromagnonic system, which simultaneously consists of cavities for photons, magnons (quanta of spin waves), and acoustic phonons, provides an exciting platform to achieve coherent energy transduction among different physical systems down to single quantum level. Here we report a dynamical phase-field model that allows simulating the coupled dynamics of the electromagnetic waves, magnetiz…
▽ More
Cavity electromagnonic system, which simultaneously consists of cavities for photons, magnons (quanta of spin waves), and acoustic phonons, provides an exciting platform to achieve coherent energy transduction among different physical systems down to single quantum level. Here we report a dynamical phase-field model that allows simulating the coupled dynamics of the electromagnetic waves, magnetization, and strain in 3D multiphase systems. As examples of application, we computationally demonstrate the excitation of hybrid magnon-photon modes (magnon polaritons), Floquet-induced magnonic Aulter-Townes splitting, dynamical energy exchange (Rabi oscillation) and relative phase control (Ramsey interference) between the two magnon polariton modes. The simulation results are consistent with analytical calculations based on Floquet Hamiltonian theory. Simulations are also performed to design a cavity electro-magno-mechanical system that enables the triple phonon-magnon-photon resonance, where the resonant excitation of a chiral, fundamental (n=1) transverse acoustic phonon mode by magnon polaritons is demonstrated. With the capability to predict coupling strength, dissipation rates, and temporal evolution of photon/magnon/phonon mode profiles using fundamental materials parameters as the inputs, the present dynamical phase-field model represents a valuable computational tool to guide the fabrication of the cavity electromagnonic system and the design of operating conditions for applications in quantum sensing, transduction, and communication.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Human-level molecular optimization driven by mol-gene evolution
Authors:
Jiebin Fang,
Churu Mao,
Yuchen Zhu,
Xiaoming Chen,
Chang-Yu Hsieh,
Zhongjun Ma
Abstract:
De novo molecule generation allows the search for more drug-like hits across a vast chemical space. However, lead optimization is still required, and the process of optimizing molecular structures faces the challenge of balancing structural novelty with pharmacological properties. This study introduces the Deep Genetic Molecular Modification Algorithm (DGMM), which brings structure modification to…
▽ More
De novo molecule generation allows the search for more drug-like hits across a vast chemical space. However, lead optimization is still required, and the process of optimizing molecular structures faces the challenge of balancing structural novelty with pharmacological properties. This study introduces the Deep Genetic Molecular Modification Algorithm (DGMM), which brings structure modification to the level of medicinal chemists. A discrete variational autoencoder (D-VAE) is used in DGMM to encode molecules as quantization code, mol-gene, which incorporates deep learning into genetic algorithms for flexible structural optimization. The mol-gene allows for the discovery of pharmacologically similar but structurally distinct compounds, and reveals the trade-offs of structural optimization in drug discovery. We demonstrate the effectiveness of the DGMM in several applications.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Water Cherenkov muon veto for the COSINUS experiment: design and simulation optimization
Authors:
G. Angloher,
M. R. Bharadwaj,
M. Cababie,
I. Dafinei,
N. Di Marco,
L. Einfalt,
F. Ferroni,
S. Fichtinger,
A. Filipponi,
T. Frank,
M. Friedl,
Z. Ge,
M. Heikinheimo,
M. N. Hughes,
K. Huitu,
M. Kellermann,
R. Maji,
M. Mancuso,
L. Pagnanini,
F. Petricca,
S. Pirro,
F. Pröbst,
G. Profeta,
A. Puiu,
F. Reindl
, et al. (14 additional authors not shown)
Abstract:
COSINUS is a dark matter (DM) direct search experiment that uses sodium iodide (NaI) crystals as cryogenic calorimeters. Thanks to the low nuclear recoil energy threshold and event-by-event discrimination capability, COSINUS will address the long-standing DM claim made by the DAMA/LIBRA collaboration. The experiment is currently under construction at the Laboratori Nazionali del Gran Sasso, Italy,…
▽ More
COSINUS is a dark matter (DM) direct search experiment that uses sodium iodide (NaI) crystals as cryogenic calorimeters. Thanks to the low nuclear recoil energy threshold and event-by-event discrimination capability, COSINUS will address the long-standing DM claim made by the DAMA/LIBRA collaboration. The experiment is currently under construction at the Laboratori Nazionali del Gran Sasso, Italy, and employs a large cylindrical water tank as a passive shield to meet the required background rate. However, muon-induced neutrons can mimic a DM signal therefore requiring an active veto system, which is achieved by instrumenting the water tank with an array of photomultiplier tubes (PMTs). This study optimizes the number, arrangement, and trigger conditions of the PMTs as well as the size of an optically invisible region. The objective was to maximize the muon veto efficiency while minimizing the accidental trigger rate due to the ambient and instrumental background. The final configuration predicts a veto efficiency of 99.63 $\pm$ 0.16 $\%$ and 44.4 $\pm$ $5.6\%$ in the tagging of muon events and showers of secondary particles, respectively. The active veto will reduce the cosmogenic neutron background rate to 0.11 $\pm$ 0.02 cts$\cdot$kg$^{-1}$$\cdot$year$^{-1}$, corresponding to less than one background event in the region of interest for the whole COSINUS-1$π$ exposure of 1000 kg$\cdot$days.
△ Less
Submitted 25 April, 2024;
originally announced June 2024.
-
Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction
Authors:
Haoqiu Yan,
Yongxin Zhu,
Kai Zheng,
Bing Liu,
Haoyu Cao,
Deqiang Jiang,
Linli Xu
Abstract:
Large Language Model (LLM)-enhanced agents become increasingly prevalent in Human-AI communication, offering vast potential from entertainment to professional domains. However, current multi-modal dialogue systems overlook the acoustic information present in speech, which is crucial for understanding human communication nuances. This oversight can lead to misinterpretations of speakers' intentions…
▽ More
Large Language Model (LLM)-enhanced agents become increasingly prevalent in Human-AI communication, offering vast potential from entertainment to professional domains. However, current multi-modal dialogue systems overlook the acoustic information present in speech, which is crucial for understanding human communication nuances. This oversight can lead to misinterpretations of speakers' intentions, resulting in inconsistent or even contradictory responses within dialogues. To bridge this gap, in this paper, we propose PerceptiveAgent, an empathetic multi-modal dialogue system designed to discern deeper or more subtle meanings beyond the literal interpretations of words through the integration of speech modality perception. Employing LLMs as a cognitive core, PerceptiveAgent perceives acoustic information from input speech and generates empathetic responses based on speaking styles described in natural language. Experimental results indicate that PerceptiveAgent excels in contextual understanding by accurately discerning the speakers' true intentions in scenarios where the linguistic meaning is either contrary to or inconsistent with the speaker's true feelings, producing more nuanced and expressive spoken dialogues. Code is publicly available at: \url{https://github.com/Haoqiu-Yan/PerceptiveAgent}.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Bridging Local Details and Global Context in Text-Attributed Graphs
Authors:
Yaoke Wang,
Yun Zhu,
Wenqiao Zhang,
Yueting Zhuang,
Yunfei Li,
Siliang Tang
Abstract:
Representation learning on text-attributed graphs (TAGs) is vital for real-world applications, as they combine semantic textual and contextual structural information. Research in this field generally consist of two main perspectives: local-level encoding and global-level aggregating, respectively refer to textual node information unification (e.g., using Language Models) and structure-augmented mo…
▽ More
Representation learning on text-attributed graphs (TAGs) is vital for real-world applications, as they combine semantic textual and contextual structural information. Research in this field generally consist of two main perspectives: local-level encoding and global-level aggregating, respectively refer to textual node information unification (e.g., using Language Models) and structure-augmented modeling (e.g., using Graph Neural Networks). Most existing works focus on combining different information levels but overlook the interconnections, i.e., the contextual textual information among nodes, which provides semantic insights to bridge local and global levels. In this paper, we propose GraphBridge, a multi-granularity integration framework that bridges local and global perspectives by leveraging contextual textual information, enhancing fine-grained understanding of TAGs. Besides, to tackle scalability and efficiency challenges, we introduce a graphaware token reduction module. Extensive experiments across various models and datasets show that our method achieves state-of-theart performance, while our graph-aware token reduction module significantly enhances efficiency and solves scalability issues.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
RichRAG: Crafting Rich Responses for Multi-faceted Queries in Retrieval-Augmented Generation
Authors:
Shuting Wang,
Xin Yu,
Mang Wang,
Weipeng Chen,
Yutao Zhu,
Zhicheng Dou
Abstract:
Retrieval-augmented generation (RAG) effectively addresses issues of static knowledge and hallucination in large language models. Existing studies mostly focus on question scenarios with clear user intents and concise answers. However, it is prevalent that users issue broad, open-ended queries with diverse sub-intents, for which they desire rich and long-form answers covering multiple relevant asp…
▽ More
Retrieval-augmented generation (RAG) effectively addresses issues of static knowledge and hallucination in large language models. Existing studies mostly focus on question scenarios with clear user intents and concise answers. However, it is prevalent that users issue broad, open-ended queries with diverse sub-intents, for which they desire rich and long-form answers covering multiple relevant aspects. To tackle this important yet underexplored problem, we propose a novel RAG framework, namely RichRAG. It includes a sub-aspect explorer to identify potential sub-aspects of input questions, a multi-faceted retriever to build a candidate pool of diverse external documents related to these sub-aspects, and a generative list-wise ranker, which is a key module to provide the top-k most valuable documents for the final generator. These ranked documents sufficiently cover various query aspects and are aware of the generator's preferences, hence incentivizing it to produce rich and comprehensive responses for users. The training of our ranker involves a supervised fine-tuning stage to ensure the basic coverage of documents, and a reinforcement learning stage to align downstream LLM's preferences to the ranking of documents. Experimental results on two publicly available datasets prove that our framework effectively and efficiently provides comprehensive and satisfying responses to users.
△ Less
Submitted 21 June, 2024; v1 submitted 18 June, 2024;
originally announced June 2024.
-
MegaVul: A C/C++ Vulnerability Dataset with Comprehensive Code Representation
Authors:
Chao Ni,
Liyu Shen,
Xiaohu Yang,
Yan Zhu,
Shaohua Wang
Abstract:
We constructed a newly large-scale and comprehensive C/C++ vulnerability dataset named MegaVul by crawling the Common Vulnerabilities and Exposures (CVE) database and CVE-related open-source projects. Specifically, we collected all crawlable descriptive information of the vulnerabilities from the CVE database and extracted all vulnerability-related code changes from 28 Git-based websites. We adopt…
▽ More
We constructed a newly large-scale and comprehensive C/C++ vulnerability dataset named MegaVul by crawling the Common Vulnerabilities and Exposures (CVE) database and CVE-related open-source projects. Specifically, we collected all crawlable descriptive information of the vulnerabilities from the CVE database and extracted all vulnerability-related code changes from 28 Git-based websites. We adopt advanced tools to ensure the extracted code integrality and enrich the code with four different transformed representations. In total, MegaVul contains 17,380 vulnerabilities collected from 992 open-source repositories spanning 169 different vulnerability types disclosed from January 2006 to October 2023. Thus, MegaVul can be used for a variety of software security-related tasks including detecting vulnerabilities and assessing vulnerability severity. All information is stored in the JSON format for easy usage. MegaVul is publicly available on GitHub and will be continuously updated. It can be easily extended to other programming languages.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Scintillation velocity and arc observations of FRB 20201124A
Authors:
Ziwei Wu,
Weiwei Zhu,
Bing Zhang,
Yi Feng,
**Lin Han,
Di Li,
Dongzi Li,
Rui Luo,
Chenhui Niu,
Jiarui Niu,
Bojun Wang,
Fayin Wang,
Pei Wang,
Weiyang Wang,
Heng Xu,
Yuanpei Yang,
Yongkun Zhang,
Dejiang Zhou,
Yuhao Zhu,
Can-Min Deng,
Yonghua Xu
Abstract:
We present the scintillation velocity measurements of FRB~20201124A from the FAST observations, which reveal an annual variation. This annual variation is further supported by changes detected in the scintillation arc as observed from the secondary spectrum. We attribute the annual velocity variation to the presence of a moderately anisotropic scattering screen located at a distance of 0.4$\pm$0.1…
▽ More
We present the scintillation velocity measurements of FRB~20201124A from the FAST observations, which reveal an annual variation. This annual variation is further supported by changes detected in the scintillation arc as observed from the secondary spectrum. We attribute the annual velocity variation to the presence of a moderately anisotropic scattering screen located at a distance of 0.4$\pm$0.1~kpc from Earth. Our results prove that the scintillation of this FRB is mainly caused by material close to Earth on a Galactic scale. However, scintillation observations of other FRBs may expose their surrounding environment or uncover possible orbital motion if scintillation is caused by materials in their host galaxy.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
An integrated electro-optically tunable multi-channel interference cavity laser
Authors:
Junxia Zhou,
Yiran Zhu,
Botao Fu,
**ming Chen,
Huiting Song,
Zhihao Zhang,
Jian** Yu,
Jian Liu,
Min Wang,
Jia Qi,
Ya Cheng
Abstract:
We demonstrated a continuously tunable laser system by butt coupling a reflective semiconductor optical amplifier (RSOA) chip with a thin-film lithium niobate (TFLN) based multi-channel interference (MCI) cavity chip. This hybrid integrated lasers allows for fine-tuning of the laser wavelength from 1538 nm to 1560 nm with a resolution of 0.014 nm and a side-mode suppression ratio (SMSR) exceeding…
▽ More
We demonstrated a continuously tunable laser system by butt coupling a reflective semiconductor optical amplifier (RSOA) chip with a thin-film lithium niobate (TFLN) based multi-channel interference (MCI) cavity chip. This hybrid integrated lasers allows for fine-tuning of the laser wavelength from 1538 nm to 1560 nm with a resolution of 0.014 nm and a side-mode suppression ratio (SMSR) exceeding 30 dB. The MCI cavity chip is fabricated using the photolithography assisted chemo-mechanical etching (PLACE) technique. The developed laser has an output power of approximately 10 μW, which can be further amplified to 70 mW using a commercial erbium-doped fiber amplifier (EDFA) without significant broadening of the laser linewidth.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Model Selection for Causal Modeling in Missing Exposure Problems
Authors:
Yuliang Shi,
Yeying Zhu,
Joel A. Dubin
Abstract:
In causal inference, properly selecting the propensity score (PS) model is a popular topic and has been widely investigated in observational studies. In addition, there is a large literature concerning the missing data problem. However, there are very few studies investigating the model selection issue for causal inference when the exposure is missing at random (MAR). In this paper, we discuss how…
▽ More
In causal inference, properly selecting the propensity score (PS) model is a popular topic and has been widely investigated in observational studies. In addition, there is a large literature concerning the missing data problem. However, there are very few studies investigating the model selection issue for causal inference when the exposure is missing at random (MAR). In this paper, we discuss how to select both imputation and PS models, which can result in the smallest RMSE of the estimated causal effect. Then, we provide a new criterion, called the ``rank score" for evaluating the overall performance of both models. The simulation studies show that the full imputation plus the outcome-related PS models lead to the smallest RMSE and the rank score can also pick the best models. An application study is conducted to study the causal effect of CVD on the mortality of COVID-19 patients.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Efficient Sequential Decision Making with Large Language Models
Authors:
Dingyang Chen,
Qi Zhang,
Yinglun Zhu
Abstract:
This paper focuses on extending the success of large language models (LLMs) to sequential decision making. Existing efforts either (i) re-train or finetune LLMs for decision making, or (ii) design prompts for pretrained LLMs. The former approach suffers from the computational burden of gradient updates, and the latter approach does not show promising results. In this paper, we propose a new approa…
▽ More
This paper focuses on extending the success of large language models (LLMs) to sequential decision making. Existing efforts either (i) re-train or finetune LLMs for decision making, or (ii) design prompts for pretrained LLMs. The former approach suffers from the computational burden of gradient updates, and the latter approach does not show promising results. In this paper, we propose a new approach that leverages online model selection algorithms to efficiently incorporate LLMs agents into sequential decision making. Statistically, our approach significantly outperforms both traditional decision making algorithms and vanilla LLM agents. Computationally, our approach avoids the need for expensive gradient updates of LLMs, and throughout the decision making process, it requires only a small number of LLM calls. We conduct extensive experiments to verify the effectiveness of our proposed approach. As an example, on a large-scale Amazon dataset, our approach achieves more than a $6$x performance gain over baselines while calling LLMs in only $1.5$\% of the time steps.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
A First Physical-World Trajectory Prediction Attack via LiDAR-induced Deceptions in Autonomous Driving
Authors:
Yang Lou,
Yi Zhu,
Qun Song,
Rui Tan,
Chunming Qiao,
Wei-Bin Lee,
Jian** Wang
Abstract:
Trajectory prediction forecasts nearby agents' moves based on their historical trajectories. Accurate trajectory prediction is crucial for autonomous vehicles. Existing attacks compromise the prediction model of a victim AV by directly manipulating the historical trajectory of an attacker AV, which has limited real-world applicability. This paper, for the first time, explores an indirect attack ap…
▽ More
Trajectory prediction forecasts nearby agents' moves based on their historical trajectories. Accurate trajectory prediction is crucial for autonomous vehicles. Existing attacks compromise the prediction model of a victim AV by directly manipulating the historical trajectory of an attacker AV, which has limited real-world applicability. This paper, for the first time, explores an indirect attack approach that induces prediction errors via attacks against the perception module of a victim AV. Although it has been shown that physically realizable attacks against LiDAR-based perception are possible by placing a few objects at strategic locations, it is still an open challenge to find an object location from the vast search space in order to launch effective attacks against prediction under varying victim AV velocities.
Through analysis, we observe that a prediction model is prone to an attack focusing on a single point in the scene. Consequently, we propose a novel two-stage attack framework to realize the single-point attack. The first stage of prediction-side attack efficiently identifies, guided by the distribution of detection results under object-based attacks against perception, the state perturbations for the prediction model that are effective and velocity-insensitive. In the second stage of location matching, we match the feasible object locations with the found state perturbations. Our evaluation using a public autonomous driving dataset shows that our attack causes a collision rate of up to 63% and various hazardous responses of the victim AV. The effectiveness of our attack is also demonstrated on a real testbed car. To the best of our knowledge, this study is the first security analysis spanning from LiDAR-based perception to prediction in autonomous driving, leading to a realistic attack on prediction. To counteract the proposed attack, potential defenses are discussed.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
m-QMDS codes over mixed alphabets via orthogonal arrays
Authors:
Shanqi Pang,
Mengqian Chen,
Rong Yan,
Yan Zhu
Abstract:
The construction of quantum error-correcting codes (QECCs) with good parameters is a hot topic in the area of quantum information and quantum computing. Quantum maximum distance separable (QMDS) codes are optimal because the minimum distance cannot be improved for a given length and code size. The QMDS codes over mixed alphabets are rarely known even if the existence and construction of QECCs over…
▽ More
The construction of quantum error-correcting codes (QECCs) with good parameters is a hot topic in the area of quantum information and quantum computing. Quantum maximum distance separable (QMDS) codes are optimal because the minimum distance cannot be improved for a given length and code size. The QMDS codes over mixed alphabets are rarely known even if the existence and construction of QECCs over mixed alphabets with minimum distance more than or equal to three are still an open question. In this paper, we define an $m$-QMDS code over mixed alphabets, which is a generalization of QMDS codes. We establish a relation between $m$-QMDS codes over mixed alphabets and asymmetrical orthogonal arrays (OAs) with orthogonal partitions. Using this relation, we propose a general method to construct $m$-QMDS codes. As applications of this method, numerous infinite families of $m$-QMDS codes over mixed alphabets can be constructed explicitly. Compared with existing codes, the constructed codes have more flexibility in the choice of parameters, such as the alphabet sizes, length and dimension of the encoding state.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
A Comprehensive Taxonomy and Analysis of Talking Head Synthesis: Techniques for Portrait Generation, Driving Mechanisms, and Editing
Authors:
Ming Meng,
Yufei Zhao,
Bo Zhang,
Yonggui Zhu,
Weimin Shi,
Maxwell Wen,
Zhaoxin Fan
Abstract:
Talking head synthesis, an advanced method for generating portrait videos from a still image driven by specific content, has garnered widespread attention in virtual reality, augmented reality and game production. Recently, significant breakthroughs have been made with the introduction of novel models such as the transformer and the diffusion model. Current methods can not only generate new conten…
▽ More
Talking head synthesis, an advanced method for generating portrait videos from a still image driven by specific content, has garnered widespread attention in virtual reality, augmented reality and game production. Recently, significant breakthroughs have been made with the introduction of novel models such as the transformer and the diffusion model. Current methods can not only generate new content but also edit the generated material. This survey systematically reviews the technology, categorizing it into three pivotal domains: portrait generation, driven mechanisms, and editing techniques. We summarize milestone studies and critically analyze their innovations and shortcomings within each domain. Additionally, we organize an extensive collection of datasets and provide a thorough performance analysis of current methodologies based on various evaluation metrics, aiming to furnish a clear framework and robust data support for future research. Finally, we explore application scenarios of talking head synthesis, illustrate them with specific cases, and examine potential future directions.
△ Less
Submitted 18 June, 2024; v1 submitted 15 June, 2024;
originally announced June 2024.
-
An Extended Validity Domain for Constraint Learning
Authors:
Yilin Zhu,
Samuel Burer
Abstract:
We consider embedding a predictive machine-learning model within a prescriptive optimization problem. In this setting, called constraint learning, we study the concept of a validity domain, i.e., a constraint added to the feasible set, which keeps the optimization close to the training data, thus hel** to ensure that the computed optimal solution exhibits less prediction error. In particular, we…
▽ More
We consider embedding a predictive machine-learning model within a prescriptive optimization problem. In this setting, called constraint learning, we study the concept of a validity domain, i.e., a constraint added to the feasible set, which keeps the optimization close to the training data, thus hel** to ensure that the computed optimal solution exhibits less prediction error. In particular, we propose a new validity domain which uses a standard convex-hull idea but in an extended space. We investigate its properties and compare it empirically with existing validity domains on a set of test problems for which the ground truth is known. Results show that our extended convex hull routinely outperforms existing validity domains, especially in terms of the function value error, that is, it exhibits closer agreement between the true function value and the predicted function value at the computed optimal solution. We also consider our approach within two stylized optimization models, which show that our method reduces feasibility error, as well as a real-world pricing case study.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
SCKansformer: Fine-Grained Classification of Bone Marrow Cells via Kansformer Backbone and Hierarchical Attention Mechanisms
Authors:
Yifei Chen,
Zhu Zhu,
Shenghao Zhu,
Linwei Qiu,
Binfeng Zou,
Fan Jia,
Yunpeng Zhu,
Chenyan Zhang,
Zhaojie Fang,
Feiwei Qin,
** Fan,
Changmiao Wang,
Yu Gao,
Gang Yu
Abstract:
The incidence and mortality rates of malignant tumors, such as acute leukemia, have risen significantly. Clinically, hospitals rely on cytological examination of peripheral blood and bone marrow smears to diagnose malignant tumors, with accurate blood cell counting being crucial. Existing automated methods face challenges such as low feature expression capability, poor interpretability, and redund…
▽ More
The incidence and mortality rates of malignant tumors, such as acute leukemia, have risen significantly. Clinically, hospitals rely on cytological examination of peripheral blood and bone marrow smears to diagnose malignant tumors, with accurate blood cell counting being crucial. Existing automated methods face challenges such as low feature expression capability, poor interpretability, and redundant feature extraction when processing high-dimensional microimage data. We propose a novel fine-grained classification model, SCKansformer, for bone marrow blood cells, which addresses these challenges and enhances classification accuracy and efficiency. The model integrates the Kansformer Encoder, SCConv Encoder, and Global-Local Attention Encoder. The Kansformer Encoder replaces the traditional MLP layer with the KAN, improving nonlinear feature representation and interpretability. The SCConv Encoder, with its Spatial and Channel Reconstruction Units, enhances feature representation and reduces redundancy. The Global-Local Attention Encoder combines Multi-head Self-Attention with a Local Part module to capture both global and local features. We validated our model using the Bone Marrow Blood Cell Fine-Grained Classification Dataset (BMCD-FGCD), comprising over 10,000 samples and nearly 40 classifications, developed with a partner hospital. Comparative experiments on our private dataset, as well as the publicly available PBC and ALL-IDB datasets, demonstrate that SCKansformer outperforms both typical and advanced microcell classification methods across all datasets. Our source code and private BMCD-FGCD dataset are available at https://github.com/JustlfC03/SCKansformer.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.