-
DHA: Learning Decoupled-Head Attention from Transformer Checkpoints via Adaptive Heads Fusion
Authors:
Yilong Chen,
Linhao Zhang,
Junyuan Shang,
Zhenyu Zhang,
Tingwen Liu,
Shuohuan Wang,
Yu Sun
Abstract:
Large language models (LLMs) with billions of parameters demonstrate impressive performance. However, the widely used Multi-Head Attention (MHA) in LLMs incurs substantial computational and memory costs during inference. While some efforts have optimized attention mechanisms by pruning heads or sharing parameters among heads, these methods often lead to performance degradation or necessitate subst…
▽ More
Large language models (LLMs) with billions of parameters demonstrate impressive performance. However, the widely used Multi-Head Attention (MHA) in LLMs incurs substantial computational and memory costs during inference. While some efforts have optimized attention mechanisms by pruning heads or sharing parameters among heads, these methods often lead to performance degradation or necessitate substantial continued pre-training costs to restore performance. Based on the analysis of attention redundancy, we design a Decoupled-Head Attention (DHA) mechanism. DHA adaptively configures group sharing for key heads and value heads across various layers, achieving a better balance between performance and efficiency. Inspired by the observation of clustering similar heads, we propose to progressively transform the MHA checkpoint into the DHA model through linear fusion of similar head parameters step by step, retaining the parametric knowledge of the MHA checkpoint. We construct DHA models by transforming various scales of MHA checkpoints given target head budgets. Our experiments show that DHA remarkably requires a mere 0.25\% of the original model's pre-training budgets to achieve 97.6\% of performance while saving 75\% of KV cache. Compared to Group-Query Attention (GQA), DHA achieves a 5$\times$ training acceleration, a maximum of 13.93\% performance improvement under 0.01\% pre-training budget, and 4\% relative improvement under 0.05\% pre-training budget.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Data Augmentation for Multivariate Time Series Classification: An Experimental Study
Authors:
Romain Ilbert,
Thai V. Hoang,
Zonghua Zhang
Abstract:
Our study investigates the impact of data augmentation on the performance of multivariate time series models, focusing on datasets from the UCR archive. Despite the limited size of these datasets, we achieved classification accuracy improvements in 10 out of 13 datasets using the Rocket and InceptionTime models. This highlights the essential role of sufficient data in training effective models, pa…
▽ More
Our study investigates the impact of data augmentation on the performance of multivariate time series models, focusing on datasets from the UCR archive. Despite the limited size of these datasets, we achieved classification accuracy improvements in 10 out of 13 datasets using the Rocket and InceptionTime models. This highlights the essential role of sufficient data in training effective models, paralleling the advancements seen in computer vision. Our work delves into adapting and applying existing methods in innovative ways to the domain of multivariate time series classification. Our comprehensive exploration of these techniques sets a new standard for addressing data scarcity in time series analysis, emphasizing that diverse augmentation strategies are crucial for unlocking the potential of both traditional and deep learning models. Moreover, by meticulously analyzing and applying a variety of augmentation techniques, we demonstrate that strategic data enrichment can enhance model accuracy. This not only establishes a benchmark for future research in time series analysis but also underscores the importance of adopting varied augmentation approaches to improve model performance in the face of limited data availability.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Strong and weak $CP$ tests in sequential decays of polarized $Σ^0$ hyperons
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (644 additional authors not shown)
Abstract:
The $J/ψ, ψ(3686) \to Σ^0 \barΣ^{0}$ processes and subsequent decays are studied using the world's largest $J/ψ$ and $ψ(3686)$ data samples collected with the BESIII detector. The strong-$CP$ symmetry is tested in the decays of the $Σ^0$ hyperons for the first time by measuring the decay parameters, $α_{Σ^0} = -0.0017 \pm 0.0021 \pm 0.0018$ and $\barα_{Σ^0} = 0.0021 \pm 0.0020 \pm 0.0022$. The wea…
▽ More
The $J/ψ, ψ(3686) \to Σ^0 \barΣ^{0}$ processes and subsequent decays are studied using the world's largest $J/ψ$ and $ψ(3686)$ data samples collected with the BESIII detector. The strong-$CP$ symmetry is tested in the decays of the $Σ^0$ hyperons for the first time by measuring the decay parameters, $α_{Σ^0} = -0.0017 \pm 0.0021 \pm 0.0018$ and $\barα_{Σ^0} = 0.0021 \pm 0.0020 \pm 0.0022$. The weak-$CP$ test is performed in the subsequent decays of their daughter particles $Λ$ and $\barΛ$. Also for the first time, the transverse polarizations of the $Σ^0$ hyperons in $J/ψ$ and $ψ(3686)$ decays are observed with opposite directions, and the ratios between the S-wave and D-wave contributions of the $J/ψ, ψ(3686) \to Σ^0 \barΣ^{0}$ decays are obtained. These results are crucial to understand the decay dynamics of the charmonium states and the production mechanism of the $Σ^0-\barΣ^0$ pairs.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
GAIA: Rethinking Action Quality Assessment for AI-Generated Videos
Authors:
Zijian Chen,
Wei Sun,
Yuan Tian,
Jun Jia,
Zicheng Zhang,
Jiarui Wang,
Ru Huang,
Xiongkuo Min,
Guangtao Zhai,
Wenjun Zhang
Abstract:
Assessing action quality is both imperative and challenging due to its significant impact on the quality of AI-generated videos, further complicated by the inherently ambiguous nature of actions within AI-generated video (AIGV). Current action quality assessment (AQA) algorithms predominantly focus on actions from real specific scenarios and are pre-trained with normative action features, thus ren…
▽ More
Assessing action quality is both imperative and challenging due to its significant impact on the quality of AI-generated videos, further complicated by the inherently ambiguous nature of actions within AI-generated video (AIGV). Current action quality assessment (AQA) algorithms predominantly focus on actions from real specific scenarios and are pre-trained with normative action features, thus rendering them inapplicable in AIGVs. To address these problems, we construct GAIA, a Generic AI-generated Action dataset, by conducting a large-scale subjective evaluation from a novel causal reasoning-based perspective, resulting in 971,244 ratings among 9,180 video-action pairs. Based on GAIA, we evaluate a suite of popular text-to-video (T2V) models on their ability to generate visually rational actions, revealing their pros and cons on different categories of actions. We also extend GAIA as a testbed to benchmark the AQA capacity of existing automatic evaluation methods. Results show that traditional AQA methods, action-related metrics in recent T2V benchmarks, and mainstream video quality methods correlate poorly with human opinions, indicating a sizable gap between current models and human action perception patterns in AIGVs. Our findings underscore the significance of action quality as a unique perspective for studying AIGVs and can catalyze progress towards methods with enhanced capacities for AQA in AIGVs.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Modeling User Retention through Generative Flow Networks
Authors:
Ziru Liu,
Shuchang Liu,
Bin Yang,
Zhenghai Xue,
Qingpeng Cai,
Xiangyu Zhao,
Zijian Zhang,
Lantao Hu,
Han Li,
Peng Jiang
Abstract:
Recommender systems aim to fulfill the user's daily demands. While most existing research focuses on maximizing the user's engagement with the system, it has recently been pointed out that how frequently the users come back for the service also reflects the quality and stability of recommendations. However, optimizing this user retention behavior is non-trivial and poses several challenges includi…
▽ More
Recommender systems aim to fulfill the user's daily demands. While most existing research focuses on maximizing the user's engagement with the system, it has recently been pointed out that how frequently the users come back for the service also reflects the quality and stability of recommendations. However, optimizing this user retention behavior is non-trivial and poses several challenges including the intractable leave-and-return user activities, the sparse and delayed signal, and the uncertain relations between users' retention and their immediate feedback towards each item in the recommendation list. In this work, we regard the retention signal as an overall estimation of the user's end-of-session satisfaction and propose to estimate this signal through a probabilistic flow. This flow-based modeling technique can back-propagate the retention reward towards each recommended item in the user session, and we show that the flow combined with traditional learning-to-rank objectives eventually optimizes a non-discounted cumulative reward for both immediate user feedback and user retention. We verify the effectiveness of our method through both offline empirical studies on two public datasets and online A/B tests in an industrial platform.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Diving into Underwater: Segment Anything Model Guided Underwater Salient Instance Segmentation and A Large-scale Dataset
Authors:
Shijie Lian,
Ziyi Zhang,
Hua Li,
Wenjie Li,
Laurence Tianruo Yang,
Sam Kwong,
Runmin Cong
Abstract:
With the breakthrough of large models, Segment Anything Model (SAM) and its extensions have been attempted to apply in diverse tasks of computer vision. Underwater salient instance segmentation is a foundational and vital step for various underwater vision tasks, which often suffer from low segmentation accuracy due to the complex underwater circumstances and the adaptive ability of models. Moreov…
▽ More
With the breakthrough of large models, Segment Anything Model (SAM) and its extensions have been attempted to apply in diverse tasks of computer vision. Underwater salient instance segmentation is a foundational and vital step for various underwater vision tasks, which often suffer from low segmentation accuracy due to the complex underwater circumstances and the adaptive ability of models. Moreover, the lack of large-scale datasets with pixel-level salient instance annotations has impeded the development of machine learning techniques in this field. To address these issues, we construct the first large-scale underwater salient instance segmentation dataset (USIS10K), which contains 10,632 underwater images with pixel-level annotations in 7 categories from various underwater scenes. Then, we propose an Underwater Salient Instance Segmentation architecture based on Segment Anything Model (USIS-SAM) specifically for the underwater domain. We devise an Underwater Adaptive Visual Transformer (UA-ViT) encoder to incorporate underwater domain visual prompts into the segmentation network. We further design an out-of-the-box underwater Salient Feature Prompter Generator (SFPG) to automatically generate salient prompters instead of explicitly providing foreground points or boxes as prompts in SAM. Comprehensive experimental results show that our USIS-SAM method can achieve superior performance on USIS10K datasets compared to the state-of-the-art methods. Datasets and codes are released on https://github.com/LiamLian0727/USIS10K.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
BS-PLCNet 2: Two-stage Band-split Packet Loss Concealment Network with Intra-model Knowledge Distillation
Authors:
Zihan Zhang,
Xianjun Xia,
Chuanzeng Huang,
Yijian Xiao,
Lei Xie
Abstract:
Audio packet loss is an inevitable problem in real-time speech communication. A band-split packet loss concealment network (BS-PLCNet) targeting full-band signals was recently proposed. Although it performs superiorly in the ICASSP 2024 PLC Challenge, BS-PLCNet is a large model with high computational complexity of 8.95G FLOPS. This paper presents its updated version, BS-PLCNet 2, to reduce comput…
▽ More
Audio packet loss is an inevitable problem in real-time speech communication. A band-split packet loss concealment network (BS-PLCNet) targeting full-band signals was recently proposed. Although it performs superiorly in the ICASSP 2024 PLC Challenge, BS-PLCNet is a large model with high computational complexity of 8.95G FLOPS. This paper presents its updated version, BS-PLCNet 2, to reduce computational complexity and improve performance further. Specifically, to compensate for the missing future information, in the wide-band module, we design a dual-path encoder structure (with non-causal and causal path) and leverage an intra-model knowledge distillation strategy to distill the future information from the non-causal teacher to the casual student. Moreover, we introduce a lightweight post-processing module after packet loss restoration to recover speech distortions and remove residual noise in the audio signal. With only 40% of original parameters in BS-PLCNet, BS-PLCNet 2 brings 0.18 PLCMOS improvement on the ICASSP 2024 PLC challenge blind set, achieving state-of-the-art performance on this dataset.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters
Authors:
Yixin Song,
Haotong Xie,
Zhengyan Zhang,
Bo Wen,
Li Ma,
Zeyu Mi,
Haibo Chen
Abstract:
Exploiting activation sparsity is a promising approach to significantly accelerating the inference process of large language models (LLMs) without compromising performance. However, activation sparsity is determined by activation functions, and commonly used ones like SwiGLU and GeGLU exhibit limited sparsity. Simply replacing these functions with ReLU fails to achieve sufficient sparsity. Moreove…
▽ More
Exploiting activation sparsity is a promising approach to significantly accelerating the inference process of large language models (LLMs) without compromising performance. However, activation sparsity is determined by activation functions, and commonly used ones like SwiGLU and GeGLU exhibit limited sparsity. Simply replacing these functions with ReLU fails to achieve sufficient sparsity. Moreover, inadequate training data can further increase the risk of performance degradation. To address these challenges, we propose a novel dReLU function, which is designed to improve LLM activation sparsity, along with a high-quality training data mixture ratio to facilitate effective sparsification. Additionally, we leverage sparse activation patterns within the Feed-Forward Network (FFN) experts of Mixture-of-Experts (MoE) models to further boost efficiency. By applying our neuron sparsification method to the Mistral and Mixtral models, only 2.5 billion and 4.3 billion parameters are activated per inference iteration, respectively, while achieving even more powerful model performance. Evaluation results demonstrate that this sparsity achieves a 2-5x decoding speedup. Remarkably, on mobile phones, our TurboSparse-Mixtral-47B achieves an inference speed of 11 tokens per second. Our models are available at \url{https://huggingface.co/PowerInfer}
△ Less
Submitted 10 June, 2024; v1 submitted 9 June, 2024;
originally announced June 2024.
-
Async Learned User Embeddings for Ads Delivery Optimization
Authors:
Mingwei Tang,
Meng Liu,
Hong Li,
Junjie Yang,
Chenglin Wei,
Boyang Li,
Dai Li,
Rengan Xu,
Yifan Xu,
Zehua Zhang,
Xiangyu Wang,
Linfeng Liu,
Yuelei Xie,
Chengye Liu,
Labib Fawaz,
Li Li,
Hongnan Wang,
Bill Zhu,
Sri Reddy
Abstract:
In recommendation systems, high-quality user embeddings can capture subtle preferences, enable precise similarity calculations, and adapt to changing preferences over time to maintain relevance. The effectiveness of recommendation systems depends on the quality of user embedding. We propose to asynchronously learn high fidelity user embeddings for billions of users each day from sequence based mul…
▽ More
In recommendation systems, high-quality user embeddings can capture subtle preferences, enable precise similarity calculations, and adapt to changing preferences over time to maintain relevance. The effectiveness of recommendation systems depends on the quality of user embedding. We propose to asynchronously learn high fidelity user embeddings for billions of users each day from sequence based multimodal user activities through a Transformer-like large scale feature learning module. The async learned user representations embeddings (ALURE) are further converted to user similarity graphs through graph learning and then combined with user realtime activities to retrieval highly related ads candidates for the ads delivery system. Our method shows significant gains in both offline and online experiments.
△ Less
Submitted 23 June, 2024; v1 submitted 9 June, 2024;
originally announced June 2024.
-
Measurement of the integrated luminosity of the data collected at 3.773 GeV by BESIII from 2021 to 2024
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (634 additional authors not shown)
Abstract:
We present a measurement of the integrated luminosity of $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at a center-of-mass energy of $E_{\rm cm} = 3.773$~GeV. The integrated luminosities of the data sets taken from December 2021 to June 2022, from November 2022 to June 2023, and from October 2023 to February 2024 are determined to be $4.995 \pm 0.019$~fb$^{-1}$,…
▽ More
We present a measurement of the integrated luminosity of $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at a center-of-mass energy of $E_{\rm cm} = 3.773$~GeV. The integrated luminosities of the data sets taken from December 2021 to June 2022, from November 2022 to June 2023, and from October 2023 to February 2024 are determined to be $4.995 \pm 0.019$~fb$^{-1}$, $8.157 \pm 0.031$~fb$^{-1}$, and $4.191 \pm 0.016$~fb$^{-1}$, respectively, by analyzing large angle Bhabha scattering events. The uncertainties are dominated by systematic effects and the statistical uncertainties are negligible. Our results provide essential input for future analyses and precision measurements.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
Methodology and Real-World Applications of Dynamic Uncertain Causality Graph for Clinical Diagnosis with Explainability and Invariance
Authors:
Zhan Zhang,
Qin Zhang,
Yang Jiao,
Lin Lu,
Lin Ma,
Aihua Liu,
Xiao Liu,
Juan Zhao,
Yajun Xue,
Bing Wei,
Mingxia Zhang,
Ru Gao,
Hong Zhao,
Jie Lu,
Fan Li,
Yang Zhang,
Yiming Wang,
Lei Zhang,
Fengwei Tian,
Jie Hu,
Xin Gou
Abstract:
AI-aided clinical diagnosis is desired in medical care. Existing deep learning models lack explainability and mainly focus on image analysis. The recently developed Dynamic Uncertain Causality Graph (DUCG) approach is causality-driven, explainable, and invariant across different application scenarios, without problems of data collection, labeling, fitting, privacy, bias, generalization, high cost…
▽ More
AI-aided clinical diagnosis is desired in medical care. Existing deep learning models lack explainability and mainly focus on image analysis. The recently developed Dynamic Uncertain Causality Graph (DUCG) approach is causality-driven, explainable, and invariant across different application scenarios, without problems of data collection, labeling, fitting, privacy, bias, generalization, high cost and high energy consumption. Through close collaboration between clinical experts and DUCG technicians, 46 DUCG models covering 54 chief complaints were constructed. Over 1,000 diseases can be diagnosed without triage. Before being applied in real-world, the 46 DUCG models were retrospectively verified by third-party hospitals. The verified diagnostic precisions were no less than 95%, in which the diagnostic precision for every disease including uncommon ones was no less than 80%. After verifications, the 46 DUCG models were applied in the real-world in China. Over one million real diagnosis cases have been performed, with only 17 incorrect diagnoses identified. Due to DUCG's transparency, the mistakes causing the incorrect diagnoses were found and corrected. The diagnostic abilities of the clinicians who applied DUCG frequently were improved significantly. Following the introduction to the earlier presented DUCG methodology, the recommendation algorithm for potential medical checks is presented and the key idea of DUCG is extracted.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
Single channel PICOSEC Micromegas detector with improved time resolution
Authors:
A. Utrobicic,
R. Aleksan,
Y. Angelis,
J. Bortfeldt,
F. Brunbauer,
M. Brunoldi,
E. Chatzianagnostou,
J. Datta,
K. Dehmelt,
G. Fanourakis,
D. Fiorina,
K. J. Floethner,
M. Gallinaro,
F. Garcia,
I. Giomataris,
K. Gnanvo,
F. J. Iguaz,
D. Janssens,
A. Kallitsopoulou,
M. Kovacic,
B. Kross,
P. Legou,
M. Lisowska,
J. Liu,
M. Lupberger
, et al. (25 additional authors not shown)
Abstract:
This paper presents design guidelines and experimental verification of a single-channel PICOSEC Micromegas (MM) detector with an improved time resolution. The design encompasses the detector board, vessel, auxiliary mechanical parts, and electrical connectivity for high voltage (HV) and signals, focusing on improving stability, reducing noise, and ensuring signal integrity to optimize timing perfo…
▽ More
This paper presents design guidelines and experimental verification of a single-channel PICOSEC Micromegas (MM) detector with an improved time resolution. The design encompasses the detector board, vessel, auxiliary mechanical parts, and electrical connectivity for high voltage (HV) and signals, focusing on improving stability, reducing noise, and ensuring signal integrity to optimize timing performance. A notable feature is the simple and fast reassembly procedure, facilitating quick replacement of detector internal components that allows for an efficient measurement strategy involving different detector components. The paper also examines the influence of parasitics on the output signal integrity. To validate the design, a prototype assembly and three interchangeable detector boards with varying readout pad diameters were manufactured. The detectors were initially tested in the laboratory environment. Finally, the timing performance of detectors with different pad sizes was verified using a Minimum Ionizing Particle (MIP) beam test. Notably, a record time resolution for a PICOSEC Micromegas detector technology with a CsI photocathode of 12.5$\pm$0.8 ps was achieved with a 10 mm diameter readout pad size detector.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
ATLAS: Improving Lay Summarisation with Attribute-based Control
Authors:
Zhihao Zhang,
Tomas Goldsack,
Carolina Scarton,
Chenghua Lin
Abstract:
Lay summarisation aims to produce summaries of scientific articles that are comprehensible to non-expert audiences. However, previous work assumes a one-size-fits-all approach, where the content and style of the produced summary are entirely dependent on the data used to train the model. In practice, audiences with different levels of expertise will have specific needs, impacting what content shou…
▽ More
Lay summarisation aims to produce summaries of scientific articles that are comprehensible to non-expert audiences. However, previous work assumes a one-size-fits-all approach, where the content and style of the produced summary are entirely dependent on the data used to train the model. In practice, audiences with different levels of expertise will have specific needs, impacting what content should appear in a lay summary and how it should be presented. Aiming to address this, we propose ATLAS, a novel abstractive summarisation approach that can control various properties that contribute to the overall "layness" of the generated summary using targeted control attributes. We evaluate ATLAS on a combination of biomedical lay summarisation datasets, where it outperforms state-of-the-art baselines using mainstream summarisation metrics. Additional analyses provided on the discriminatory power and emergent influence of our selected controllable attributes further attest to the effectiveness of our approach.
△ Less
Submitted 8 June, 2024;
originally announced June 2024.
-
A Two-Stage Adverse Weather Semantic Segmentation Method for WeatherProof Challenge CVPR 2024 Workshop UG2+
Authors:
Jianzhao Wang,
Yanyan Wei,
Dehua Hu,
Yilin Zhang,
Shengeng Tang,
Dan Guo,
Zhao Zhang
Abstract:
This technical report presents our team's solution for the WeatherProof Dataset Challenge: Semantic Segmentation in Adverse Weather at CVPR'24 UG2+. We propose a two-stage deep learning framework for this task. In the first stage, we preprocess the provided dataset by concatenating images into video sequences. Subsequently, we leverage a low-rank video deraining method to generate high-fidelity ps…
▽ More
This technical report presents our team's solution for the WeatherProof Dataset Challenge: Semantic Segmentation in Adverse Weather at CVPR'24 UG2+. We propose a two-stage deep learning framework for this task. In the first stage, we preprocess the provided dataset by concatenating images into video sequences. Subsequently, we leverage a low-rank video deraining method to generate high-fidelity pseudo ground truths. These pseudo ground truths offer superior alignment compared to the original ground truths, facilitating model convergence during training. In the second stage, we employ the InternImage network to train for the semantic segmentation task using the generated pseudo ground truths. Notably, our meticulously designed framework demonstrates robustness to degraded data captured under adverse weather conditions. In the challenge, our solution achieved a competitive score of 0.43 on the Mean Intersection over Union (mIoU) metric, securing a respectable rank of 4th.
△ Less
Submitted 8 June, 2024;
originally announced June 2024.
-
Near-Field Channel Estimation for Extremely Large-Scale Terahertz Communications
Authors:
Songjie Yang,
Yizhou Peng,
Wanting Lyu,
Ya Li,
Hongjun He,
Zhongpei Zhang,
Chau Yuen
Abstract:
Future Terahertz communications exhibit significant potential in accommodating ultra-high-rate services. Employing extremely large-scale array antennas is a key approach to realize this potential, as they can harness substantial beamforming gains to overcome the severe path loss and leverage the electromagnetic advantages in the near field. This paper proposes novel estimation methods designed to…
▽ More
Future Terahertz communications exhibit significant potential in accommodating ultra-high-rate services. Employing extremely large-scale array antennas is a key approach to realize this potential, as they can harness substantial beamforming gains to overcome the severe path loss and leverage the electromagnetic advantages in the near field. This paper proposes novel estimation methods designed to enhance efficiency in Terahertz widely-spaced multi-subarray (WSMS) systems. Initially, we introduce three sparse channel representation methods: polar-domain representation (PD-R), multi-angular-domain representation (MAD-R), and two-dimensional polar-angular-domain representation (2D-PAD-R). Each method is meticulously developed for near-field WSMS channels, capitalizing on their sparsity characteristics. Building on this, we propose four estimation frameworks using the sparse recovery theory: polar-domain estimation (PD-E), multi-angular-domain estimation (MAD-E), two-stage polar-angular-domain estimation (TS-PAD-E), and two-dimensional polar-angular-domain estimation (2D-PAD-E). Particularly, 2D-PAD-E, integrating a 2D dictionary process, and TS-PAD-E, with its sequential approach to angle and distance estimation, stand out as particularly effective for near-field angle-distance estimation, enabling decoupled calculation of these parameters. Overall, these frameworks provide versatile and efficient solutions for WSMS channel estimation, balancing low complexity with high-performance outcomes. Additionally, they represent a fresh perspective on near-field signal processing.
△ Less
Submitted 8 June, 2024;
originally announced June 2024.
-
Integrating Text and Image Pre-training for Multi-modal Algorithmic Reasoning
Authors:
Zijian Zhang,
Wei Liu
Abstract:
In this paper, we present our solution for SMART-101 Challenge of CVPR Multi-modal Algorithmic Reasoning Task 2024. Unlike traditional visual questions and answer tasks, this challenge evaluates abstraction, deduction and generalization ability of neural network in solving visuo-linguistic puzzles designed for specially children in the 6-8 age group. Our model is based on two pre-trained models, d…
▽ More
In this paper, we present our solution for SMART-101 Challenge of CVPR Multi-modal Algorithmic Reasoning Task 2024. Unlike traditional visual questions and answer tasks, this challenge evaluates abstraction, deduction and generalization ability of neural network in solving visuo-linguistic puzzles designed for specially children in the 6-8 age group. Our model is based on two pre-trained models, dedicated to extract features from text and image respectively. To integrate the features from different modalities, we employed a fusion layer with attention mechanism. We explored different text and image pre-trained models, and fine-tune the integrated classifier on the SMART-101 dataset. Experiment results show that under the data splitting style of puzzle split, our proposed integrated classifier achieves superior performance, verifying the effectiveness of multi-modal pre-trained representations.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
Topological photonic alloy
Authors:
Tiantao Qu,
Mudi Wang,
Xiaoyu Cheng,
Xiaohan Cui,
Ruo-Yang Zhang,
Zhao-Qing Zhang,
Lei Zhang,
Jun Chen,
C. T. Chan
Abstract:
We present the new concept of photonic alloy as a non-periodic topological material. By mixing non-magnetized and magnetized rods in a non-periodic 2D photonic crystal configuration, we realized photonic alloys in the microwave regime. Our experimental findings reveal that the photonic alloy sustains non-reciprocal chiral edge states (CESs) even at very low concentration of magnetized rods. The no…
▽ More
We present the new concept of photonic alloy as a non-periodic topological material. By mixing non-magnetized and magnetized rods in a non-periodic 2D photonic crystal configuration, we realized photonic alloys in the microwave regime. Our experimental findings reveal that the photonic alloy sustains non-reciprocal chiral edge states (CESs) even at very low concentration of magnetized rods. The non-trivial topology and the associated edge states of these non-periodic systems can be characterized by the winding of the reflection phase. Our results indicate that the threshold concentrations for the investigated system within the first non-trivial band gap to exhibit topological behavior approach zero in the thermodynamic limit for substitutional alloys, while the threshold remains non-zero for interstitial alloys. At low concentration, the system exhibits an inhomogeneous structure characterized by isolated patches of non-percolating magnetic domains that are spaced far apart within a topologically trivial photonic crystal. Surprisingly, the system manifests CESs despite a local breakdown of time-reversal symmetry rather than a global one. Photonic alloys represent a new category of disordered topological materials, offering exciting opportunities for exploring topological materials with adjustable gaps.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
RU-AI: A Large Multimodal Dataset for Machine Generated Content Detection
Authors:
Liting Huang,
Zhihao Zhang,
Yiran Zhang,
Xiyue Zhou,
Shou** Wang
Abstract:
The recent advancements in generative AI models, which can create realistic and human-like content, are significantly transforming how people communicate, create, and work. While the appropriate use of generative AI models can benefit the society, their misuse poses significant threats to data reliability and authentication. However, due to a lack of aligned multimodal datasets, effective and robu…
▽ More
The recent advancements in generative AI models, which can create realistic and human-like content, are significantly transforming how people communicate, create, and work. While the appropriate use of generative AI models can benefit the society, their misuse poses significant threats to data reliability and authentication. However, due to a lack of aligned multimodal datasets, effective and robust methods for detecting machine-generated content are still in the early stages of development. In this paper, we introduce RU-AI, a new large-scale multimodal dataset designed for the robust and efficient detection of machine-generated content in text, image, and voice. Our dataset is constructed from three large publicly available datasets: Flickr8K, COCO, and Places205, by combining the original datasets and their corresponding machine-generated pairs. Additionally, experimental results show that our proposed unified model, which incorporates a multimodal embedding module with a multilayer perceptron network, can effectively determine the origin of the data (i.e., original data samples or machine-generated ones) from RU-AI. However, future work is still required to address the remaining challenges posed by RU-AI. The source code and dataset are available at https://github.com/ZhihaoZhang97/RU-AI.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
Magnetism of $\mathrm{NaYbS_2}$: From finite temperatures to ground state
Authors:
Weizhen Zhuo,
Zheng Zhang,
Mingtai Xie,
Anmin Zhang,
Jianting Ji,
Feng **,
Qingming Zhang
Abstract:
Rare-earth chalcogenide compounds $\mathrm{ARECh_2}$ (A = alkali or monovalent metal, RE = rare earth, Ch = O, S, Se, Te) are a large family of quantum spin liquid (QSL) candidate materials. $\mathrm{NaYbS_2}$ is a representative member of the family. Several key issues on $\mathrm{NaYbS_2}$, particularly how to determine the highly anisotropic spin Hamiltonian and describe the magnetism at finite…
▽ More
Rare-earth chalcogenide compounds $\mathrm{ARECh_2}$ (A = alkali or monovalent metal, RE = rare earth, Ch = O, S, Se, Te) are a large family of quantum spin liquid (QSL) candidate materials. $\mathrm{NaYbS_2}$ is a representative member of the family. Several key issues on $\mathrm{NaYbS_2}$, particularly how to determine the highly anisotropic spin Hamiltonian and describe the magnetism at finite temperatures and the ground state, remain to be addressed. In this paper, we conducted an in-depth and comprehensive study on the magnetism of $\mathrm{NaYbS_2}$ from finite temperatures to the ground state. Firstly, we successfully detected three crystalline electric field (CEF) excitation energy levels using low-temperature Raman scattering technique. Combining them with the CEF theory and magnetization data, we worked out the CEF parameters, CEF energy levels, and CEF wavefunctions. We further determined a characteristic temperature of $\sim$40 K, above which the magnetism is dominated by CEF excitations while below which the spin-exchange interactions play a main role. The characteristic temperature has been confirmed by the temperature-dependent electron spin resonance (ESR) linewidth. Low-temperature ESR experiments on the dilute magnetic doped crystal of $\mathrm{NaYb_{0.1}Lu_{0.9}S_2}$ further helped us to determine the accurate $g$-factor. Next, we quantitatively obtained the spin-exchange interactions in the spin Hamiltonian by consistently simulating the magnetization and specific heat data. Finally, the above studies allow us to explore the ground state magnetism of $\mathrm{NaYbS_2}$ by using the density matrix renormalization group. We combined numerical calculations and experimental results to demonstrate that the ground state of $\mathrm{NaYbS_2}$ is a Dirac-like QSL.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
Mixed Finite Element Method for Multi-layer Elastic Contact Systems
Authors:
Zhizhuo Zhang,
Mikaël Barboteu,
Xiaobing Nie,
Serge Dumont,
Mahmoud Abdel-Aty,
**de Cao
Abstract:
With the development of multi-layer elastic systems in the field of engineering mechanics, the corresponding variational inequality theory and algorithm design have received more attention and research. In this study, a class of equivalent saddle point problems with interlayer Tresca friction conditions and the mixed finite element method are proposed and analyzed. Then, the convergence of the num…
▽ More
With the development of multi-layer elastic systems in the field of engineering mechanics, the corresponding variational inequality theory and algorithm design have received more attention and research. In this study, a class of equivalent saddle point problems with interlayer Tresca friction conditions and the mixed finite element method are proposed and analyzed. Then, the convergence of the numerical solution of the mixed finite element method is theoretically proven, and the corresponding algebraic dual algorithm is given. Finally, through numerical experiments, the mixed finite element method is not only compared with the layer decomposition method, but also its convergence relationship with respect to the spatial discretization parameter $H$ is verified.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
A layer decomposition method for multi-layer elastic contact systems with interlayer Tresca friction
Authors:
Zhizhuo Zhang,
Xiaobing Nie,
Mikaël Barboteu,
**de Cao
Abstract:
With the increasing demand for the accuracy of numerical simulation of pavement mechanics, the variational inequality model and its induced finite element method which can simulate the interlayer contact state becomes a potential solution. In this paper, a layer decomposition algorithm for solving variational inequality models of multi-layer elastic contact systems with interlayer Tresca friction…
▽ More
With the increasing demand for the accuracy of numerical simulation of pavement mechanics, the variational inequality model and its induced finite element method which can simulate the interlayer contact state becomes a potential solution. In this paper, a layer decomposition algorithm for solving variational inequality models of multi-layer elastic contact systems with interlayer Tresca friction conditions is studied. Continuous and discrete versions of the algorithm and their convergence theorems have been proposed and proved successively. Then, the algebraic form of the executable optimization algorithm and the numerical experimental results verify the practicability of the variational inequality model and its algorithm in the pavement mechanics modeling.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
SF-V: Single Forward Video Generation Model
Authors:
Zhixing Zhang,
Yanyu Li,
Yushu Wu,
Yanwu Xu,
Anil Kag,
Ivan Skorokhodov,
Willi Menapace,
Aliaksandr Siarohin,
Junli Cao,
Dimitris Metaxas,
Sergey Tulyakov,
Jian Ren
Abstract:
Diffusion-based video generation models have demonstrated remarkable success in obtaining high-fidelity videos through the iterative denoising process. However, these models require multiple denoising steps during sampling, resulting in high computational costs. In this work, we propose a novel approach to obtain single-step video generation models by leveraging adversarial training to fine-tune p…
▽ More
Diffusion-based video generation models have demonstrated remarkable success in obtaining high-fidelity videos through the iterative denoising process. However, these models require multiple denoising steps during sampling, resulting in high computational costs. In this work, we propose a novel approach to obtain single-step video generation models by leveraging adversarial training to fine-tune pre-trained video diffusion models. We show that, through the adversarial training, the multi-steps video diffusion model, i.e., Stable Video Diffusion (SVD), can be trained to perform single forward pass to synthesize high-quality videos, capturing both temporal and spatial dependencies in the video data. Extensive experiments demonstrate that our method achieves competitive generation quality of synthesized videos with significantly reduced computational overhead for the denoising process (i.e., around $23\times$ speedup compared with SVD and $6\times$ speedup compared with existing works, with even better generation quality), paving the way for real-time video synthesis and editing. More visualization results are made publicly available at https://snap-research.github.io/SF-V.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Bench2Drive: Towards Multi-Ability Benchmarking of Closed-Loop End-To-End Autonomous Driving
Authors:
Xiaosong Jia,
Zhenjie Yang,
Qifeng Li,
Zhiyuan Zhang,
Junchi Yan
Abstract:
In an era marked by the rapid scaling of foundation models, autonomous driving technologies are approaching a transformative threshold where end-to-end autonomous driving (E2E-AD) emerges due to its potential of scaling up in the data-driven manner. However, existing E2E-AD methods are mostly evaluated under the open-loop log-replay manner with L2 errors and collision rate as metrics (e.g., in nuS…
▽ More
In an era marked by the rapid scaling of foundation models, autonomous driving technologies are approaching a transformative threshold where end-to-end autonomous driving (E2E-AD) emerges due to its potential of scaling up in the data-driven manner. However, existing E2E-AD methods are mostly evaluated under the open-loop log-replay manner with L2 errors and collision rate as metrics (e.g., in nuScenes), which could not fully reflect the driving performance of algorithms as recently acknowledged in the community. For those E2E-AD methods evaluated under the closed-loop protocol, they are tested in fixed routes (e.g., Town05Long and Longest6 in CARLA) with the driving score as metrics, which is known for high variance due to the unsmoothed metric function and large randomness in the long route. Besides, these methods usually collect their own data for training, which makes algorithm-level fair comparison infeasible.
To fulfill the paramount need of comprehensive, realistic, and fair testing environments for Full Self-Driving (FSD), we present Bench2Drive, the first benchmark for evaluating E2E-AD systems' multiple abilities in a closed-loop manner. Bench2Drive's official training data consists of 2 million fully annotated frames, collected from 10000 short clips uniformly distributed under 44 interactive scenarios (cut-in, overtaking, detour, etc), 23 weathers (sunny, foggy, rainy, etc), and 12 towns (urban, village, university, etc) in CARLA v2. Its evaluation protocol requires E2E-AD models to pass 44 interactive scenarios under different locations and weathers which sums up to 220 routes and thus provides a comprehensive and disentangled assessment about their driving capability under different situations. We implement state-of-the-art E2E-AD models and evaluate them in Bench2Drive, providing insights regarding current status and future directions.
△ Less
Submitted 11 June, 2024; v1 submitted 6 June, 2024;
originally announced June 2024.
-
Time-resolved optical assessment of exciton formation in mixed two-dimensional perovskite films
Authors:
Zheng Zhang,
Jianan Wang,
Yijie Shi,
Xi Wang,
Zhong Wang,
Xiangyu Zhu,
Chunlong Hu,
Zonghao Liu,
Wei Chen,
Wenxi Liang
Abstract:
We report the observation of exciton formation from the cooled band-edge carriers in mixed two-dimensional hybrid organic-inorganic perovskites using femtosecond transient absorption spectroscopy. By monitoring the changes of bleach signal upon excitations with various photon energy, we are able to extract the values of exciton binding energy and the occupancies of carriers of free and bound state…
▽ More
We report the observation of exciton formation from the cooled band-edge carriers in mixed two-dimensional hybrid organic-inorganic perovskites using femtosecond transient absorption spectroscopy. By monitoring the changes of bleach signal upon excitations with various photon energy, we are able to extract the values of exciton binding energy and the occupancies of carriers of free and bound states for each two-dimensional phase. We also confirm the existence of Mahan exciton when injected carrier density is above the Mott criterion.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Energy-storing analysis and fishtail stiffness optimization for a wire-driven elastic robotic fish
Authors:
Xiaocun Liao,
Chao Zhou,
Junfeng Fan,
Zhuoliang Zhang,
Zhaoran Yin,
Liangwei Deng
Abstract:
The robotic fish with high propulsion efficiency and good maneuverability achieves underwater fishlike propulsion by commonly adopting the motor to drive the fishtail, causing the significant fluctuations of the motor power due to the uneven swing speed of the fishtail in one swing cycle. Hence, we propose a wire-driven robotic fish with a spring-steel-based active-segment elastic spine. This bion…
▽ More
The robotic fish with high propulsion efficiency and good maneuverability achieves underwater fishlike propulsion by commonly adopting the motor to drive the fishtail, causing the significant fluctuations of the motor power due to the uneven swing speed of the fishtail in one swing cycle. Hence, we propose a wire-driven robotic fish with a spring-steel-based active-segment elastic spine. This bionic spine can produce elastic deformation to store energy under the action of the wire driving and motor for responding to the fluctuations of the motor power. Further, we analyze the effects of the energy-storing of the active-segment elastic spine on the smoothness of motor power. Based on the developed Lagrangian dynamic model and cantilever beam model, the power-variance-based nonlinear optimization model for the stiffness of the active-segment elastic spine is established to respond to the sharp fluctuations of motor power during each fishtail swing cycle. Results validate that the energy-storing of the active-segment elastic spine plays a vital role in improving the power fluctuations and maximum frequency of the motor by adjusting its stiffness reasonably, which is beneficial to achieving high propulsion and high speed for robotic fish. Compared with the active-segment rigid spine that is incapable of storing energy, the energy-storing of the active-segment elastic spine is beneficial to increase the maximum frequency of the motor and the average thrust of the fishtail by 0.41 Hz, and 0.06 N, respectively.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
OceanCastNet: A Deep Learning Ocean Wave Model with Energy Conservation
Authors:
Ziliang Zhang,
Huaming Yu,
Danqin Ren
Abstract:
Traditional wave forecasting models, although based on energy conservation equations, are computationally expensive. On the other hand, existing deep learning geophysical fluid models, while computationally efficient, often suffer from issues such as energy dissipation in long-term forecasts. This paper proposes a novel energy-balanced deep learning wave forecasting model called OceanCastNet (OCN)…
▽ More
Traditional wave forecasting models, although based on energy conservation equations, are computationally expensive. On the other hand, existing deep learning geophysical fluid models, while computationally efficient, often suffer from issues such as energy dissipation in long-term forecasts. This paper proposes a novel energy-balanced deep learning wave forecasting model called OceanCastNet (OCN). By incorporating wind fields at the current, previous, and future time steps, as well as wave fields at the current and previous time steps as input variables, OCN maintains energy balance within the model. Furthermore, the model employs adaptive Fourier operators as its core components and designs a masked loss function to better handle the impact of land-sea boundaries. A series of experiments on the ERA5 dataset demonstrate that OCN can achieve short-term forecast accuracy comparable to traditional models while exhibiting an understanding of the wave generation process. In comparative experiments under both normal and extreme conditions, OCN consistently outperforms the widely used WaveWatch III model in the industry. Even after long-term forecasting, OCN maintains a stable and energy-rich state. By further constructing a simple meteorological model, OCN-wind, which considers energy balance, this paper confirms the importance of energy constraints for improving the long-term forecast performance of deep learning meteorological models. This finding provides new ideas for future research on deep learning geophysical fluid models.
△ Less
Submitted 9 June, 2024; v1 submitted 6 June, 2024;
originally announced June 2024.
-
PREX and CREX: Evidence for Strong Isovector Spin-Orbit Interaction
Authors:
Tong-Gang Yue,
Zhen Zhang,
Lie-Wen Chen
Abstract:
The recent PREX-2 and CREX data on the model-independent extraction of the charge-weak form factor difference $ΔF_{\rm CW}$ in $^{208}$Pb and $^{48}$Ca challenge modern nuclear energy density functionals (EDFs) as well as our present understanding on the neutron skin and nuclear symmetry energy. Within the Skyrme-like EDFs, we demonstrate that the isovector spin-orbit interaction can strongly chan…
▽ More
The recent PREX-2 and CREX data on the model-independent extraction of the charge-weak form factor difference $ΔF_{\rm CW}$ in $^{208}$Pb and $^{48}$Ca challenge modern nuclear energy density functionals (EDFs) as well as our present understanding on the neutron skin and nuclear symmetry energy. Within the Skyrme-like EDFs, we demonstrate that the isovector spin-orbit interaction can strongly change the $ΔF_{\rm CW}$ in $^{48}$Ca while it has essentially no influence on the $ΔF_{\rm CW}$ in $^{208}$Pb, mainly due to the eight spin-orbit unpaired $1f_{7/2}$ neutrons in $^{48}$Ca. To simultaneously describe PREX-2 and CREX data in $1σ$ error, we find the strength of isovector spin-orbit interaction should be larger than about four times of that in the conventional Skyrme-like EDFs, implying the neutrons and protons have significantly different spin-orbit interaction. To further reconcile the data on electric dipole polarizability in $^{208}$Pb and $^{48}$Ca, we obtain $L \approx 55$ MeV for the slope parameter of the symmetry energy, $Δr_{\rm np}(^{208}\rm{Pb}) \approx 0.19$ fm and $Δr_{\rm np}(^{48}\rm{Ca}) \approx 0.12$ fm for the neutron skin thickness. The implications of the strong isovector spin-orbit interaction are discussed.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Blow-up of cylindrically symmetric solutions for Fractional NLS
Authors:
Tianxiang Gou,
Vicentiu D. Radulescu,
Zhitao Zhang
Abstract:
In this paper, we consider blow-up of solutions to the Cauchy problem for the following fractional NLS, $$ \textnormal{i} \, \partial_t u=(-Δ)^s u-|u|^{2 σ} u \quad \text{in} \,\, \R \times \R^N, $$ where $N \geq 2$, $1/2 <s<1$ and $0<σ<2s/(N-2s)$. In the mass critical and supercritical cases, we establish a criterion for blow-up of solutions to the problem for cylindrically symmetric data. And we…
▽ More
In this paper, we consider blow-up of solutions to the Cauchy problem for the following fractional NLS, $$ \textnormal{i} \, \partial_t u=(-Δ)^s u-|u|^{2 σ} u \quad \text{in} \,\, \R \times \R^N, $$ where $N \geq 2$, $1/2 <s<1$ and $0<σ<2s/(N-2s)$. In the mass critical and supercritical cases, we establish a criterion for blow-up of solutions to the problem for cylindrically symmetric data. And we establish the existence of finite time blow-up solutions in the mass supercritical case. The results extend the known ones with respect to blow-up of solutions to the problem for radially symmetric data.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Global tensor polarization of spin $3/2$ hadrons and quark spin correlations in relativistic heavy ion collisions
Authors:
Zhe Zhang,
Ji-peng Lv,
Zi-han Yu,
Zuo-tang Liang
Abstract:
We study the global polarization of spin-$3/2$ hadrons in relativistic heavy ion collisions. We show in particular that the global tensor polarizations of rank two or three for spin-$3/2$ hadrons are sensitive to the local two or three quark spin correlations respectively in the quark gluon plasma produced in the collision processes. We present the relationships between these measurable tensor pol…
▽ More
We study the global polarization of spin-$3/2$ hadrons in relativistic heavy ion collisions. We show in particular that the global tensor polarizations of rank two or three for spin-$3/2$ hadrons are sensitive to the local two or three quark spin correlations respectively in the quark gluon plasma produced in the collision processes. We present the relationships between these measurable tensor polarizations and quark spin correlations in the quark matter system.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Field Theory of Active Brownian Particles with Dry Friction
Authors:
Ziluo Zhang,
Shurui Yuan,
Shigeyuki Komura
Abstract:
We present a field theoretic approach to capture the motion of a particle with dry friction for one- and two-dimensional diffusive particles, and further expand the framework for two-dimensional active Brownian particles. Starting with the Fokker-Planck equation and introducing the Hermite polynomials as the corresponding eigen-functions, we obtain the actions and propagators. Using a perturbation…
▽ More
We present a field theoretic approach to capture the motion of a particle with dry friction for one- and two-dimensional diffusive particles, and further expand the framework for two-dimensional active Brownian particles. Starting with the Fokker-Planck equation and introducing the Hermite polynomials as the corresponding eigen-functions, we obtain the actions and propagators. Using a perturbation expansion, we calculate the effective diffusion coefficient in the presence of both wet and dry frictions in a perturbative way via the Green-Kubo relation. We further compare the analytical result with the numerical simulation. Our result can be used to estimate the values of dry friction coefficient in experiments.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Phonon heat conduction across slippery interfaces in twisted graphite
Authors:
Fuwei Yang,
Wenjiang Zhou,
Zhibin Zhang,
Xuanyu Huang,
**gwen Zhang,
Nianjie Liang,
Wujuan Yan,
Yuxi Wang,
Mingchao Ding,
Quanlin Guo,
Yu Han,
Te-Huan Liu,
Kaihui Liu,
Quanshui Zheng,
Bai Song
Abstract:
Interlayer rotation in van der Waals (vdW) materials offers great potential for manipulating phonon dynamics and heat flow in advanced electronics with ever higher compactness and power density. However, despite extensive theoretical efforts in recent years, experimental measurements remain scarce especially due to the critical challenges of preparing single-crystalline twisted interfaces and prob…
▽ More
Interlayer rotation in van der Waals (vdW) materials offers great potential for manipulating phonon dynamics and heat flow in advanced electronics with ever higher compactness and power density. However, despite extensive theoretical efforts in recent years, experimental measurements remain scarce especially due to the critical challenges of preparing single-crystalline twisted interfaces and probing interfacial thermal transport with sufficient resolution. Here, we exploited the intrinsic twisted interfaces in highly oriented pyrolytic graphite (HOPG). By develo** novel experimental schemes based on microfabricated mesas, we managed to achieve simultaneous mechanical characterizations and thermal measurements. In particular, we pushed the HOPG mesas with a microprobe to identify and rotate single-crystalline intrinsic interfaces owing to their slippery nature as is well known in structural superlubricity. Remarkably, we observed over 30-fold suppression of thermal conductance for the slippery interfaces by using epitaxial graphite as a control. Nonetheless, the interfacial conductance remains around 600 $\mathrm{MWm^{-2}K^{-1}}$ which surpasses the highest values for artificially stacked vdW structures by more than five times. Further, atomic simulations revealed the predominant role of the transverse acoustic phonons. Together, our findings highlight a general physical picture that directly correlates interfacial thermal transport with sliding resistance, and lay the foundation for twist-enabled thermal management which are particularly beneficial to twistronics and slidetronics.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Efficient Knowledge Infusion via KG-LLM Alignment
Authors:
Zhouyu Jiang,
Ling Zhong,
Mengshu Sun,
Jun Xu,
Rui Sun,
Hui Cai,
Shuhan Luo,
Zhiqiang Zhang
Abstract:
To tackle the problem of domain-specific knowledge scarcity within large language models (LLMs), knowledge graph-retrievalaugmented method has been proven to be an effective and efficient technique for knowledge infusion. However, existing approaches face two primary challenges: knowledge mismatch between public available knowledge graphs and the specific domain of the task at hand, and poor infor…
▽ More
To tackle the problem of domain-specific knowledge scarcity within large language models (LLMs), knowledge graph-retrievalaugmented method has been proven to be an effective and efficient technique for knowledge infusion. However, existing approaches face two primary challenges: knowledge mismatch between public available knowledge graphs and the specific domain of the task at hand, and poor information compliance of LLMs with knowledge graphs. In this paper, we leverage a small set of labeled samples and a large-scale corpus to efficiently construct domain-specific knowledge graphs by an LLM, addressing the issue of knowledge mismatch. Additionally, we propose a three-stage KG-LLM alignment strategyto enhance the LLM's capability to utilize information from knowledge graphs. We conduct experiments with a limited-sample setting on two biomedical question-answering datasets, and the results demonstrate that our approach outperforms existing baselines.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
A Survey on Medical Large Language Models: Technology, Application, Trustworthiness, and Future Directions
Authors:
Lei Liu,
Xiaoyan Yang,
Junchi Lei,
Xiaoyang Liu,
Yue Shen,
Zhiqiang Zhang,
Peng Wei,
**jie Gu,
Zhixuan Chu,
Zhan Qin,
Kui Ren
Abstract:
Large language models (LLMs), such as GPT series models, have received substantial attention due to their impressive capabilities for generating and understanding human-level language. More recently, LLMs have emerged as an innovative and powerful adjunct in the medical field, transforming traditional practices and heralding a new era of enhanced healthcare services. This survey provides a compreh…
▽ More
Large language models (LLMs), such as GPT series models, have received substantial attention due to their impressive capabilities for generating and understanding human-level language. More recently, LLMs have emerged as an innovative and powerful adjunct in the medical field, transforming traditional practices and heralding a new era of enhanced healthcare services. This survey provides a comprehensive overview of Medical Large Language Models (Med-LLMs), outlining their evolution from general to the medical-specific domain (i.e, Technology and Application), as well as their transformative impact on healthcare (e.g., Trustworthiness and Safety). Concretely, starting from the fundamental history and technology of LLMs, we first delve into the progressive adaptation and refinements of general LLM models in the medical domain, especially emphasizing the advanced algorithms that boost the LLMs' performance in handling complicated medical environments, including clinical reasoning, knowledge graph, retrieval-augmented generation, human alignment, and multi-modal learning. Secondly, we explore the extensive applications of Med-LLMs across domains such as clinical decision support, report generation, and medical education, illustrating their potential to streamline healthcare services and augment patient outcomes. Finally, recognizing the imperative and responsible innovation, we discuss the challenges of ensuring fairness, accountability, privacy, and robustness in Med-LLMs applications. Finally, we conduct a concise discussion for anticipating possible future trajectories of Med-LLMs, identifying avenues for the prudent expansion of Med-LLMs. By consolidating above-mentioned insights, this review seeks to provide a comprehensive investigation of the potential strengths and limitations of Med-LLMs for professionals and researchers, ensuring a responsible landscape in the healthcare setting.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Ferroelectricity-tuned band topology and superconductivity in two-dimensional materials and related heterostructures
Authors:
Jianyong Chen,
** Cui,
Zhenyu Zhang
Abstract:
Ferroelectricity, band topology, and superconductivity are respectively local, global, and macroscopic properties of quantum materials, and understanding their mutual couplings offers unique opportunities for exploring rich physics and enhanced functionalities. In this mini-review, we attempt to highlight some of the latest advances in this vibrant area, focusing in particular on ferroelectricity-…
▽ More
Ferroelectricity, band topology, and superconductivity are respectively local, global, and macroscopic properties of quantum materials, and understanding their mutual couplings offers unique opportunities for exploring rich physics and enhanced functionalities. In this mini-review, we attempt to highlight some of the latest advances in this vibrant area, focusing in particular on ferroelectricity-tuned superconductivity and band topology in two-dimensional (2D) materials and related heterostructures. We will first present results from predictive studies of the delicate couplings between ferroelectricity and topology or superconductivity based on first-principles calculations and phenomenological modeling, with ferroelectricity-enabled topological superconductivity as an appealing objective. Next, we will cover the latest advances on experimental studies of ferroelectricity-tuned superconductivity based on different 2D materials or van der Waals heterostructures. Finally, as perspectives, we will outline schemes that may allow to materialize new types of 2D systems that simultaneously harbor ferroelectricity and superconductivity, or that may lead to enhanced ferroelectric superconductivity, ferroelectric topological superconductivity, and new types of superconducting devices such as superconducting diodes.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Principles of Designing Robust Remote Face Anti-Spoofing Systems
Authors:
Xiang Xu,
Tianchen Zhao,
Zheng Zhang,
Zhihua Li,
Jon Wu,
Alessandro Achille,
Mani Srivastava
Abstract:
Protecting digital identities of human face from various attack vectors is paramount, and face anti-spoofing plays a crucial role in this endeavor. Current approaches primarily focus on detecting spoofing attempts within individual frames to detect presentation attacks. However, the emergence of hyper-realistic generative models capable of real-time operation has heightened the risk of digitally g…
▽ More
Protecting digital identities of human face from various attack vectors is paramount, and face anti-spoofing plays a crucial role in this endeavor. Current approaches primarily focus on detecting spoofing attempts within individual frames to detect presentation attacks. However, the emergence of hyper-realistic generative models capable of real-time operation has heightened the risk of digitally generated attacks. In light of these evolving threats, this paper aims to address two key aspects. First, it sheds light on the vulnerabilities of state-of-the-art face anti-spoofing methods against digital attacks. Second, it presents a comprehensive taxonomy of common threats encountered in face anti-spoofing systems. Through a series of experiments, we demonstrate the limitations of current face anti-spoofing detection techniques and their failure to generalize to novel digital attack scenarios. Notably, the existing models struggle with digital injection attacks including adversarial noise, realistic deepfake attacks, and digital replay attacks. To aid in the design and implementation of robust face anti-spoofing systems resilient to these emerging vulnerabilities, the paper proposes key design principles from model accuracy and robustness to pipeline robustness and even platform robustness. Especially, we suggest to implement the proactive face anti-spoofing system using active sensors to significant reduce the risks for unseen attack vectors and improve the user experience.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Refactoring to Pythonic Idioms: A Hybrid Knowledge-Driven Approach Leveraging Large Language Models
Authors:
Zejun Zhang,
Zhenchang Xing,
Xiaoxue Ren,
Qinghua Lu,
Xiwei Xu
Abstract:
Pythonic idioms are highly valued and widely used in the Python programming community. However, many Python users find it challenging to use Pythonic idioms. Adopting a rule-based approach or LLM-only approach is not sufficient to overcome three persistent challenges of code idiomatization including code miss, wrong detection and wrong refactoring. Motivated by the determinism of rules and adaptab…
▽ More
Pythonic idioms are highly valued and widely used in the Python programming community. However, many Python users find it challenging to use Pythonic idioms. Adopting a rule-based approach or LLM-only approach is not sufficient to overcome three persistent challenges of code idiomatization including code miss, wrong detection and wrong refactoring. Motivated by the determinism of rules and adaptability of LLMs, we propose a hybrid approach consisting of three modules. We not only write prompts to instruct LLMs to complete tasks, but we also invoke Analytic Rule Interfaces (ARIs) to accomplish tasks. The ARIs are Python code generated by prompting LLMs to generate code. We first construct a knowledge module with three elements including ASTscenario, ASTcomponent and Condition, and prompt LLMs to generate Python code for incorporation into an ARI library for subsequent use. After that, for any syntax-error-free Python code, we invoke ARIs from the ARI library to extract ASTcomponent from the ASTscenario, and then filter out ASTcomponent that does not meet the condition. Finally, we design prompts to instruct LLMs to abstract and idiomatize code, and then invoke ARIs from the ARI library to rewrite non-idiomatic code into the idiomatic code. Next, we conduct a comprehensive evaluation of our approach, RIdiom, and Prompt-LLM on nine established Pythonic idioms in RIdiom. Our approach exhibits superior accuracy, F1-score, and recall, while maintaining precision levels comparable to RIdiom, all of which consistently exceed or come close to 90% for each metric of each idiom. Lastly, we extend our evaluation to encompass four new Pythonic idioms. Our approach consistently outperforms Prompt-LLM, achieving metrics with values consistently exceeding 90% for accuracy, F1-score, precision, and recall.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
AD-H: Autonomous Driving with Hierarchical Agents
Authors:
Zaibin Zhang,
Shiyu Tang,
Yuanhang Zhang,
Talas Fu,
Yifan Wang,
Yang Liu,
Dong Wang,
**g Shao,
Lijun Wang,
Huchuan Lu
Abstract:
Due to the impressive capabilities of multimodal large language models (MLLMs), recent works have focused on employing MLLM-based agents for autonomous driving in large-scale and dynamic environments. However, prevalent approaches often directly translate high-level instructions into low-level vehicle control signals, which deviates from the inherent language generation paradigm of MLLMs and fails…
▽ More
Due to the impressive capabilities of multimodal large language models (MLLMs), recent works have focused on employing MLLM-based agents for autonomous driving in large-scale and dynamic environments. However, prevalent approaches often directly translate high-level instructions into low-level vehicle control signals, which deviates from the inherent language generation paradigm of MLLMs and fails to fully harness their emergent powers. As a result, the generalizability of these methods is highly restricted by autonomous driving datasets used during fine-tuning. To tackle this challenge, we propose to connect high-level instructions and low-level control signals with mid-level language-driven commands, which are more fine-grained than high-level instructions but more universal and explainable than control signals, and thus can effectively bridge the gap in between. We implement this idea through a hierarchical multi-agent driving system named AD-H, including a MLLM planner for high-level reasoning and a lightweight controller for low-level execution. The hierarchical design liberates the MLLM from low-level control signal decoding and therefore fully releases their emergent capability in high-level perception, reasoning, and planning. We build a new dataset with action hierarchy annotations. Comprehensive closed-loop evaluations demonstrate several key advantages of our proposed AD-H system. First, AD-H can notably outperform state-of-the-art methods in achieving exceptional driving performance, even exhibiting self-correction capabilities during vehicle operation, a scenario not encountered in the training dataset. Second, AD-H demonstrates superior generalization under long-horizon instructions and novel environmental conditions, significantly surpassing current state-of-the-art methods. We will make our data and code publicly accessible at https://github.com/zhangzaibin/AD-H
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Structure-based Drug Design Benchmark: Do 3D Methods Really Dominate?
Authors:
Kangyu Zheng,
Yingzhou Lu,
Zaixi Zhang,
Zhongwei Wan,
Yao Ma,
Marinka Zitnik,
Tianfan Fu
Abstract:
Currently, the field of structure-based drug design is dominated by three main types of algorithms: search-based algorithms, deep generative models, and reinforcement learning. While existing works have typically focused on comparing models within a single algorithmic category, cross-algorithm comparisons remain scarce. In this paper, to fill the gap, we establish a benchmark to evaluate the perfo…
▽ More
Currently, the field of structure-based drug design is dominated by three main types of algorithms: search-based algorithms, deep generative models, and reinforcement learning. While existing works have typically focused on comparing models within a single algorithmic category, cross-algorithm comparisons remain scarce. In this paper, to fill the gap, we establish a benchmark to evaluate the performance of sixteen models across these different algorithmic foundations by assessing the pharmaceutical properties of the generated molecules and their docking affinities with specified target proteins. We highlight the unique advantages of each algorithmic approach and offer recommendations for the design of future SBDD models. We emphasize that 1D/2D ligand-centric drug design methods can be used in SBDD by treating the docking function as a black-box oracle, which is typically neglected. The empirical results show that 1D/2D methods achieve competitive performance compared with 3D-based methods that use the 3D structure of the target protein explicitly. Also, AutoGrow4, a 2D molecular graph-based genetic algorithm, dominates SBDD in terms of optimization ability. The relevant code is available in https://github.com/zkysfls/2024-sbdd-benchmark.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Measurement of the branching fraction ratios $R(D^{+})$ and $R(D^{*+})$ using muonic $τ$ decays
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1063 additional authors not shown)
Abstract:
The branching fraction ratios of $\overline{B}^0\to D^+τ^-\overlineν_τ$ and $\overline{B}^0\to D^{*+}τ^-\overlineν_τ$ decays are measured with respect to their muonic counterparts, using a data sample corresponding to an integrated luminosity of 2.0 fb$^{-1}$ collected by the LHCb experiment in proton-proton collisions at $\sqrt{s} = 13$ TeV. The reconstructed final states are formed by combining…
▽ More
The branching fraction ratios of $\overline{B}^0\to D^+τ^-\overlineν_τ$ and $\overline{B}^0\to D^{*+}τ^-\overlineν_τ$ decays are measured with respect to their muonic counterparts, using a data sample corresponding to an integrated luminosity of 2.0 fb$^{-1}$ collected by the LHCb experiment in proton-proton collisions at $\sqrt{s} = 13$ TeV. The reconstructed final states are formed by combining $D^+$ mesons with $τ^-\toμ^-\overlineν_μν_τ$ candidates, where the $D^+$ is reconstructed via the $D^+\to K^-π^+π^+$ decay. The results are
\begin{align*}
R(D^{+}) &= 0.249 \pm 0.043 \pm 0.047,
R(D^{*+}) &= 0.402 \pm 0.081\pm 0.085,
\end{align*}
where the first uncertainties are statistical and the second systematic. The two measurements have a correlation coefficient of $-0.39$ and are compatible with the Standard Model.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
SpikeLM: Towards General Spike-Driven Language Modeling via Elastic Bi-Spiking Mechanisms
Authors:
Xingrun Xing,
Zheng Zhang,
Ziyi Ni,
Shitao Xiao,
Yiming Ju,
Siqi Fan,
Yequan Wang,
Jiajun Zhang,
Guoqi Li
Abstract:
Towards energy-efficient artificial intelligence similar to the human brain, the bio-inspired spiking neural networks (SNNs) have advantages of biological plausibility, event-driven sparsity, and binary activation. Recently, large-scale language models exhibit promising generalization capability, making it a valuable issue to explore more general spike-driven models. However, the binary spikes in…
▽ More
Towards energy-efficient artificial intelligence similar to the human brain, the bio-inspired spiking neural networks (SNNs) have advantages of biological plausibility, event-driven sparsity, and binary activation. Recently, large-scale language models exhibit promising generalization capability, making it a valuable issue to explore more general spike-driven models. However, the binary spikes in existing SNNs fail to encode adequate semantic information, placing technological challenges for generalization. This work proposes the first fully spiking mechanism for general language tasks, including both discriminative and generative ones. Different from previous spikes with {0,1} levels, we propose a more general spike formulation with bi-directional, elastic amplitude, and elastic frequency encoding, while still maintaining the addition nature of SNNs. In a single time step, the spike is enhanced by direction and amplitude information; in spike frequency, a strategy to control spike firing rate is well designed. We plug this elastic bi-spiking mechanism in language modeling, named SpikeLM. It is the first time to handle general language tasks with fully spike-driven models, which achieve much higher accuracy than previously possible. SpikeLM also greatly bridges the performance gap between SNNs and ANNs in language modeling. Our code is available at https://github.com/Xingrun-Xing/SpikeLM.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Prompt-based Visual Alignment for Zero-shot Policy Transfer
Authors:
Haihan Gao,
Rui Zhang,
Qi Yi,
Hantao Yao,
Haochen Li,
Jiaming Guo,
Shaohui Peng,
Yunkai Gao,
QiCheng Wang,
Xing Hu,
Yuanbo Wen,
Zihao Zhang,
Zidong Du,
Ling Li,
Qi Guo,
Yunji Chen
Abstract:
Overfitting in RL has become one of the main obstacles to applications in reinforcement learning(RL). Existing methods do not provide explicit semantic constrain for the feature extractor, hindering the agent from learning a unified cross-domain representation and resulting in performance degradation on unseen domains. Besides, abundant data from multiple domains are needed. To address these issue…
▽ More
Overfitting in RL has become one of the main obstacles to applications in reinforcement learning(RL). Existing methods do not provide explicit semantic constrain for the feature extractor, hindering the agent from learning a unified cross-domain representation and resulting in performance degradation on unseen domains. Besides, abundant data from multiple domains are needed. To address these issues, in this work, we propose prompt-based visual alignment (PVA), a robust framework to mitigate the detrimental domain bias in the image for zero-shot policy transfer. Inspired that Visual-Language Model (VLM) can serve as a bridge to connect both text space and image space, we leverage the semantic information contained in a text sequence as an explicit constraint to train a visual aligner. Thus, the visual aligner can map images from multiple domains to a unified domain and achieve good generalization performance. To better depict semantic information, prompt tuning is applied to learn a sequence of learnable tokens. With explicit constraints of semantic information, PVA can learn unified cross-domain representation under limited access to cross-domain data and achieves great zero-shot generalization ability in unseen domains. We verify PVA on a vision-based autonomous driving task with CARLA simulator. Experiments show that the agent generalizes well on unseen domains under limited access to multi-domain data.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Bayesian WeakS-to-Strong from Text Classification to Generation
Authors:
Ziyun Cui,
Ziyang Zhang,
Wen Wu,
Guangzhi Sun,
Chao Zhang
Abstract:
Advances in large language models raise the question of how alignment techniques will adapt as models become increasingly complex and humans will only be able to supervise them weakly. Weak-to-Strong mimics such a scenario where weak model supervision attempts to harness the full capabilities of a much stronger model. This work extends Weak-to-Strong to WeakS-to-Strong by exploring an ensemble of…
▽ More
Advances in large language models raise the question of how alignment techniques will adapt as models become increasingly complex and humans will only be able to supervise them weakly. Weak-to-Strong mimics such a scenario where weak model supervision attempts to harness the full capabilities of a much stronger model. This work extends Weak-to-Strong to WeakS-to-Strong by exploring an ensemble of weak models which simulate the variability in human opinions. Confidence scores are estimated using a Bayesian approach to guide the WeakS-to-Strong generalization. Furthermore, we extend the application of WeakS-to-Strong from text classification tasks to text generation tasks where more advanced strategies are investigated for supervision. Moreover, direct preference optimization is applied to advance the student model's preference learning, beyond the basic learning framework of teacher forcing. Results demonstrate the effectiveness of the proposed approach for the reliability of a strong student model, showing potential for superalignment.
△ Less
Submitted 24 May, 2024;
originally announced June 2024.
-
Observation of new charmonium(-like) states in $B^+ \to D^{*\pm} D^{\mp} K^+$ decays
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1062 additional authors not shown)
Abstract:
A study of resonant structures in $B^{+}\rightarrow{D^{\ast+}D^{-}K^{+}}$ and $B^{+}\rightarrow{D^{\ast-}D^{+}K^{+}}$ decays is performed, using proton-proton collision data at centre-of-mass energies of $\sqrt{s}=7, 8$, and $13$ TeV recorded by the LHCb experiment, corresponding to an integrated luminosity of 9 fb$^{-1}$. A simultaneous amplitude fit is performed to the two channels with contribu…
▽ More
A study of resonant structures in $B^{+}\rightarrow{D^{\ast+}D^{-}K^{+}}$ and $B^{+}\rightarrow{D^{\ast-}D^{+}K^{+}}$ decays is performed, using proton-proton collision data at centre-of-mass energies of $\sqrt{s}=7, 8$, and $13$ TeV recorded by the LHCb experiment, corresponding to an integrated luminosity of 9 fb$^{-1}$. A simultaneous amplitude fit is performed to the two channels with contributions from resonances decaying to $D^{\ast-}D^{+}$ and $D^{\ast+}D^{-}$ states linked by $C$ parity. This procedure allows the $C$-parities of resonances in the $D^{\ast\pm}D^{\mp}$ mass spectra to be determined. Four charmonium(-like) states are observed decaying into $D^{\ast\pm}D^{\mp}$: $η_c(3945)$, $h_c(4000)$, $χ_{c1}(4010)$ and $h_c(4300)$, with quantum numbers $J^{PC}$ equal to $0^{-+}$, $1^{+-}$, $1^{++}$ and $1^{+-}$, respectively. At least three of these states have not been observed previously. In addition, the existence of the $T_{\bar{c}\bar{s}0}^{*}(2870)^{0}$ and $T_{\bar{c}\bar{s}1}^{*}(2900)^{0}$ resonances in the $D^-K^+$ mass spectrum, already observed in the $B^+ \to D^+ D^- K^+$ decay, is confirmed in a different production channel.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
A-Bench: Are LMMs Masters at Evaluating AI-generated Images?
Authors:
Zicheng Zhang,
Haoning Wu,
Chunyi Li,
Yingjie Zhou,
Wei Sun,
Xiongkuo Min,
Zijian Chen,
Xiaohong Liu,
Weisi Lin,
Guangtao Zhai
Abstract:
How to accurately and efficiently assess AI-generated images (AIGIs) remains a critical challenge for generative models. Given the high costs and extensive time commitments required for user studies, many researchers have turned towards employing large multi-modal models (LMMs) as AIGI evaluators, the precision and validity of which are still questionable. Furthermore, traditional benchmarks often…
▽ More
How to accurately and efficiently assess AI-generated images (AIGIs) remains a critical challenge for generative models. Given the high costs and extensive time commitments required for user studies, many researchers have turned towards employing large multi-modal models (LMMs) as AIGI evaluators, the precision and validity of which are still questionable. Furthermore, traditional benchmarks often utilize mostly natural-captured content rather than AIGIs to test the abilities of LMMs, leading to a noticeable gap for AIGIs. Therefore, we introduce A-Bench in this paper, a benchmark designed to diagnose whether LMMs are masters at evaluating AIGIs. Specifically, A-Bench is organized under two key principles: 1) Emphasizing both high-level semantic understanding and low-level visual quality perception to address the intricate demands of AIGIs. 2) Various generative models are utilized for AIGI creation, and various LMMs are employed for evaluation, which ensures a comprehensive validation scope. Ultimately, 2,864 AIGIs from 16 text-to-image models are sampled, each paired with question-answers annotated by human experts, and tested across 18 leading LMMs. We hope that A-Bench will significantly enhance the evaluation process and promote the generation quality for AIGIs. The benchmark is available at https://github.com/Q-Future/A-Bench.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Measurements of the branching fractions of the $P$-wave charmonium spin-singlet state $h_c(^1P_1) \to h^+ h^-π^0/η$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (643 additional authors not shown)
Abstract:
Based on $(2712.4\pm 14.3)\times10^{6}$ $ψ(3686)$ events, we investigate four hadronic decay modes of the $P$-wave charmonium spin-singlet state $h_c(^1P_1) \to h^+ h^- π^0/η$ ($h=π$ or $K$) via the process $ψ(3686) \to π^{0}h_c$ at BESIII. The $h_c \to π^+ π^- π^0$ decay is observed with a significance of 9.6$σ$ after taking into account systematic uncertainties. Evidences for…
▽ More
Based on $(2712.4\pm 14.3)\times10^{6}$ $ψ(3686)$ events, we investigate four hadronic decay modes of the $P$-wave charmonium spin-singlet state $h_c(^1P_1) \to h^+ h^- π^0/η$ ($h=π$ or $K$) via the process $ψ(3686) \to π^{0}h_c$ at BESIII. The $h_c \to π^+ π^- π^0$ decay is observed with a significance of 9.6$σ$ after taking into account systematic uncertainties. Evidences for $h_c \to K^+ K^- π^0$ and $h_c \to K^+ K^- η$ are found with significances of $3.5σ$ and $3.3σ$, respectively, after considering the systematic uncertainties. The branching fractions of these decays are measured to be $\mathcal{B}(h_c \to π^+ π^- π^0)=(1.36\pm0.16\pm0.14)\times10^{-3}$, $\mathcal{B}(h_c \to K^+ K^- π^0)=(3.26\pm0.84\pm0.36)\times10^{-4}$, and $\mathcal{B}(h_c \to K^+ K^- η)=(3.13\pm1.08\pm0.38)\times10^{-4}$, where the first uncertainties are statistical and the second are systematic. No significant signal of $h_c\toπ^+π^-η$ is found, and the upper limit of its decay branching fraction is determined to be $\mathcal{B}(h_c\toπ^+π^-η) < 4.0 \times 10^{-4}$ at 90% confidence level.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Unveiling a Family of Dimerized Quantum Magnets in Ternary Metal Borides
Authors:
Zhen Zhang,
Andrew P. Porter,
Yang Sun,
Kirill D. Belashchenko,
Gayatri Viswanathan,
Arka Sarkar,
Kirill Kovnir,
Kai-Ming Ho,
Vladimir Antropov
Abstract:
Dimerized quantum magnets are exotic crystalline materials where Bose-Einstein condensation of magnetic excitations can happen. However, known dimerized quantum magnets are limited to only a few oxides and halides. Here, we unveil 9 dimerized quantum magnets and 11 conventional antiferromagnets in ternary metal borides MTB$_4$ (M = Sc, Y, La, Ce, Lu, Mg, Ca, Al; T = V, Cr, Mn, Fe, Co, Ni). In this…
▽ More
Dimerized quantum magnets are exotic crystalline materials where Bose-Einstein condensation of magnetic excitations can happen. However, known dimerized quantum magnets are limited to only a few oxides and halides. Here, we unveil 9 dimerized quantum magnets and 11 conventional antiferromagnets in ternary metal borides MTB$_4$ (M = Sc, Y, La, Ce, Lu, Mg, Ca, Al; T = V, Cr, Mn, Fe, Co, Ni). In this type of structure, 3d transition-metal atoms T are arranged in dimers. Quantum magnetism in these compounds is dominated by strong antiferromagnetic interactions between Cr (both Cr and Mn for M = Mg and Ca) atoms within the structural dimers, with much weaker interactions between the dimers. These systems are proposed to be close to a quantum critical point between a disordered singlet spin-dimer phase, with a spin gap, and the ordered conventional Néel antiferromagnetic phase. This new family of dimerized quantum magnets greatly enriches the materials inventory that allows investigations of the spin-gap phase. All the quantum-, conventionally-, and non-magnetic systems identified, together with experimental synthesis methods of a phase suitable for characterization, provide a platform with abundant possibilities to tune the magnetic exchange coupling by do** and study this unconventional type of quantum phase transition. This work opens up new avenues for studying the quantum magnetism of spin dimers in borides and establishes a theoretical workflow for future searches for dimerized quantum magnets in other families or types of materials.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Development of an underwater inductive coupling communication system with power carrier technology
Authors:
Zhongxing Zhang
Abstract:
Inductive coupling communication is one of the main methods of underwater communication systems due to its excellent comprehensive performance. However, the data transmission distance and operational power consumption need to be further enhanced. In this paper, an underwater induction coupling communication scheme based on power carrier technology is proposed to improve the transmission speed and…
▽ More
Inductive coupling communication is one of the main methods of underwater communication systems due to its excellent comprehensive performance. However, the data transmission distance and operational power consumption need to be further enhanced. In this paper, an underwater induction coupling communication scheme based on power carrier technology is proposed to improve the transmission speed and reduce the bit error rate. The microcontroller of STM32L series with ultra-low power consumption was employed as the core of the system. Through the construction and simulation of the communication channel, the optimal parameters were determined. According to the circuit model of the power carrier communication, the effect of different modulation and demodulation methods to the signal transmission quality were discussed, which demonstrates the superiority of Differential Phase Shift Keying (DPSK). With the system-level low power design and onboard communication quality optimization, the device was developed. The test results in the laboratory environment show that the system can achieve efficient data communication with a rate of 115200bps and static power consumption as low as 660μA in the 700m channel. This study provides a practical design approach for the high-speed communication and Low-power operation of underwater communication systems.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
A generalized cycle benchmarking algorithm for characterizing mid-circuit measurements
Authors:
Zhihan Zhang,
Senrui Chen,
Yunchao Liu,
Liang Jiang
Abstract:
Mid-circuit measurement (MCM) is a crucial ingredient in the development of fault-tolerant quantum computation. While there have been rapid experimental progresses in realizing MCM, a systematic method for characterizing noisy MCM is still under exploration. In this work we develop an algorithm to characterize noisy MCM, via a generalization of cycle benchmarking -- a standard approach for charact…
▽ More
Mid-circuit measurement (MCM) is a crucial ingredient in the development of fault-tolerant quantum computation. While there have been rapid experimental progresses in realizing MCM, a systematic method for characterizing noisy MCM is still under exploration. In this work we develop an algorithm to characterize noisy MCM, via a generalization of cycle benchmarking -- a standard approach for characterizing the Pauli noise channel of Clifford gates. The key idea is to use a joint Fourier transform on the classical and quantum registers and then estimate parameters in the Fourier space, analogous to Pauli fidelities used in cycle benchmarking. Furthermore, we develop a theory of the noise learnability of MCM, which determines what information can be learned about the noise model (in the presence of state preparation and measurement noise) and what cannot, which shows that all learnable information can be learned using our algorithm. As an application, we show how to use the learned information to test the independence between measurement noise and state preparation noise in a MCM. Finally, we conduct numerical simulations to illustrate the practical applicability of the algorithm. Similar to cycle benchmarking, we expect the algorithm to provide a useful toolkit that is of experimental interest.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Less is More: Pseudo-Label Filtering for Continual Test-Time Adaptation
Authors:
Jiayao Tan,
Fan Lyu,
Chenggong Ni,
Tingliang Feng,
Fuyuan Hu,
Zhang Zhang,
Shaochuang Zhao,
Liang Wang
Abstract:
Continual Test-Time Adaptation (CTTA) aims to adapt a pre-trained model to a sequence of target domains during the test phase without accessing the source data. To adapt to unlabeled data from unknown domains, existing methods rely on constructing pseudo-labels for all samples and updating the model through self-training. However, these pseudo-labels often involve noise, leading to insufficient ad…
▽ More
Continual Test-Time Adaptation (CTTA) aims to adapt a pre-trained model to a sequence of target domains during the test phase without accessing the source data. To adapt to unlabeled data from unknown domains, existing methods rely on constructing pseudo-labels for all samples and updating the model through self-training. However, these pseudo-labels often involve noise, leading to insufficient adaptation. To improve the quality of pseudo-labels, we propose a pseudo-label selection method for CTTA, called Pseudo Labeling Filter (PLF). The key idea of PLF is to keep selecting appropriate thresholds for pseudo-labels and identify reliable ones for self-training. Specifically, we present three principles for setting thresholds during continuous domain learning, including initialization, growth and diversity. Based on these principles, we design Self-Adaptive Thresholding to filter pseudo-labels. Additionally, we introduce a Class Prior Alignment (CPA) method to encourage the model to make diverse predictions for unknown domain samples. Through extensive experiments, PLF outperforms current state-of-the-art methods, proving its effectiveness in CTTA.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Analyzing Temporal Complex Events with Large Language Models? A Benchmark towards Temporal, Long Context Understanding
Authors:
Zhihan Zhang,
Yixin Cao,
Chenchen Ye,
Yunshan Ma,
Lizi Liao,
Tat-Seng Chua
Abstract:
The digital landscape is rapidly evolving with an ever-increasing volume of online news, emphasizing the need for swift and precise analysis of complex events. We refer to the complex events composed of many news articles over an extended period as Temporal Complex Event (TCE). This paper proposes a novel approach using Large Language Models (LLMs) to systematically extract and analyze the event c…
▽ More
The digital landscape is rapidly evolving with an ever-increasing volume of online news, emphasizing the need for swift and precise analysis of complex events. We refer to the complex events composed of many news articles over an extended period as Temporal Complex Event (TCE). This paper proposes a novel approach using Large Language Models (LLMs) to systematically extract and analyze the event chain within TCE, characterized by their key points and timestamps. We establish a benchmark, named TCELongBench, to evaluate the proficiency of LLMs in handling temporal dynamics and understanding extensive text. This benchmark encompasses three distinct tasks - reading comprehension, temporal sequencing, and future event forecasting. In the experiment, we leverage retrieval-augmented generation (RAG) method and LLMs with long context window to deal with lengthy news articles of TCE. Our findings indicate that models with suitable retrievers exhibit comparable performance with those utilizing long context window.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.