-
DiffusionBlend: Learning 3D Image Prior through Position-aware Diffusion Score Blending for 3D Computed Tomography Reconstruction
Authors:
Bowen Song,
Jason Hu,
Zhaoxu Luo,
Jeffrey A. Fessler,
Liyue Shen
Abstract:
Diffusion models face significant challenges when employed for large-scale medical image reconstruction in real practice such as 3D Computed Tomography (CT). Due to the demanding memory, time, and data requirements, it is difficult to train a diffusion model directly on the entire volume of high-dimensional data to obtain an efficient 3D diffusion prior. Existing works utilizing diffusion priors o…
▽ More
Diffusion models face significant challenges when employed for large-scale medical image reconstruction in real practice such as 3D Computed Tomography (CT). Due to the demanding memory, time, and data requirements, it is difficult to train a diffusion model directly on the entire volume of high-dimensional data to obtain an efficient 3D diffusion prior. Existing works utilizing diffusion priors on single 2D image slice with hand-crafted cross-slice regularization would sacrifice the z-axis consistency, which results in severe artifacts along the z-axis. In this work, we propose a novel framework that enables learning the 3D image prior through position-aware 3D-patch diffusion score blending for reconstructing large-scale 3D medical images. To the best of our knowledge, we are the first to utilize a 3D-patch diffusion prior for 3D medical image reconstruction. Extensive experiments on sparse view and limited angle CT reconstruction show that our DiffusionBlend method significantly outperforms previous methods and achieves state-of-the-art performance on real-world CT reconstruction problems with high-dimensional 3D image (i.e., $256 \times 256 \times 500$). Our algorithm also comes with better or comparable computational efficiency than previous state-of-the-art methods.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Search for $X(1870)$ via the decay $J/ψ\to ωK^+ K^-η$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (644 additional authors not shown)
Abstract:
Using a sample of $(10087\pm 44)\times10^{6}$ $J/ψ$ events collected by the BESIII detector at the BEPCII collider, we search for the decay $X(1870)\to K^+ K^-η$ via the $J/ψ\to ωK^+ K^- η$ process for the first time. No significant $X(1870)$ signal is observed. The upper limit on the branching fraction of the decay $ J/ψ\to ωX(1870) \toωK^+ K^- η$ is determined to be $9.55\times 10^{-7}$ at the…
▽ More
Using a sample of $(10087\pm 44)\times10^{6}$ $J/ψ$ events collected by the BESIII detector at the BEPCII collider, we search for the decay $X(1870)\to K^+ K^-η$ via the $J/ψ\to ωK^+ K^- η$ process for the first time. No significant $X(1870)$ signal is observed. The upper limit on the branching fraction of the decay $ J/ψ\to ωX(1870) \toωK^+ K^- η$ is determined to be $9.55\times 10^{-7}$ at the $90\%$ confidence level. In addition, the branching faction $B(J/ψ\toωK^+ K^- η)$ is measured to be $(3.33\pm0.02(\rm{stat.})\pm 0.12(\rm{syst.}))\times 10^{-4}$.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Investigation of reflection-based measurements of microwave kinetic inductance detectors in the optical bands
Authors:
Jie Hu,
Faouzi Boussaha,
Paul Nicaise,
Christine Chaumont,
Maria Appavou,
Viet Dung Pham,
Michel Piat
Abstract:
In this paper, we investigate the single photon response from the reflection of the Microwave Kinetic Inductance Detector (MKID) array. Reflection measurements are carried out using two configurations: one is measured simultaneously with the transmission, and the other is obtained with a single-ended MKID array terminated with an open load. Compared with the transmission, reflection measurements s…
▽ More
In this paper, we investigate the single photon response from the reflection of the Microwave Kinetic Inductance Detector (MKID) array. Reflection measurements are carried out using two configurations: one is measured simultaneously with the transmission, and the other is obtained with a single-ended MKID array terminated with an open load. Compared with the transmission, reflection measurements significantly reduce the readout noise of the single-ended MKID array. This is also reflected in the improvement of the median energy resolving power by around 20%-30% under pulsed photon illumination at $λ= 405$~nm, mainly due to an increase in the size of the resonance circle on the IQ plane. This method has the potential to be used to read out large MKID arrays.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities
Authors:
Roman Bachmann,
Oğuzhan Fatih Kar,
David Mizrahi,
Ali Garjani,
Mingfei Gao,
David Griffiths,
Jiaming Hu,
Afshin Dehghan,
Amir Zamir
Abstract:
Current multimodal and multitask foundation models like 4M or UnifiedIO show promising results, but in practice their out-of-the-box abilities to accept diverse inputs and perform diverse tasks are limited by the (usually rather small) number of modalities and tasks they are trained on. In this paper, we expand upon the capabilities of them by training a single model on tens of highly diverse moda…
▽ More
Current multimodal and multitask foundation models like 4M or UnifiedIO show promising results, but in practice their out-of-the-box abilities to accept diverse inputs and perform diverse tasks are limited by the (usually rather small) number of modalities and tasks they are trained on. In this paper, we expand upon the capabilities of them by training a single model on tens of highly diverse modalities and by performing co-training on large-scale multimodal datasets and text corpora. This includes training on several semantic and geometric modalities, feature maps from recent state of the art models like DINOv2 and ImageBind, pseudo labels of specialist models like SAM and 4DHumans, and a range of new modalities that allow for novel ways to interact with the model and steer the generation, for example image metadata or color palettes. A crucial step in this process is performing discrete tokenization on various modalities, whether they are image-like, neural network feature maps, vectors, structured data like instance segmentation or human poses, or data that can be represented as text. Through this, we expand on the out-of-the-box capabilities of multimodal models and specifically show the possibility of training one model to solve at least 3x more tasks/modalities than existing ones and doing so without a loss in performance. This enables more fine-grained and controllable multimodal generation capabilities and allows us to study the distillation of models trained on diverse data and objectives into a unified model. We successfully scale the training to a three billion parameter model using tens of modalities and different datasets. The resulting models and training code are open sourced at 4m.epfl.ch.
△ Less
Submitted 14 June, 2024; v1 submitted 13 June, 2024;
originally announced June 2024.
-
EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts
Authors:
Yucheng Han,
Rui Wang,
Chi Zhang,
Juntao Hu,
Pei Cheng,
Bin Fu,
Hanwang Zhang
Abstract:
Recent advancements in image generation have enabled the creation of high-quality images from text conditions. However, when facing multi-modal conditions, such as text combined with reference appearances, existing methods struggle to balance multiple conditions effectively, typically showing a preference for one modality over others. To address this challenge, we introduce EMMA, a novel image gen…
▽ More
Recent advancements in image generation have enabled the creation of high-quality images from text conditions. However, when facing multi-modal conditions, such as text combined with reference appearances, existing methods struggle to balance multiple conditions effectively, typically showing a preference for one modality over others. To address this challenge, we introduce EMMA, a novel image generation model accepting multi-modal prompts built upon the state-of-the-art text-to-image (T2I) diffusion model, ELLA. EMMA seamlessly incorporates additional modalities alongside text to guide image generation through an innovative Multi-modal Feature Connector design, which effectively integrates textual and supplementary modal information using a special attention mechanism. By freezing all parameters in the original T2I diffusion model and only adjusting some additional layers, we reveal an interesting finding that the pre-trained T2I diffusion model can secretly accept multi-modal prompts. This interesting property facilitates easy adaptation to different existing frameworks, making EMMA a flexible and effective tool for producing personalized and context-aware images and even videos. Additionally, we introduce a strategy to assemble learned EMMA modules to produce images conditioned on multiple modalities simultaneously, eliminating the need for additional training with mixed multi-modal prompts. Extensive experiments demonstrate the effectiveness of EMMA in maintaining high fidelity and detail in generated images, showcasing its potential as a robust solution for advanced multi-modal conditional image generation tasks.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Research on Early Warning Model of Cardiovascular Disease Based on Computer Deep Learning
Authors:
Yuxiang Hu,
**xin Hu,
Ting Xu,
Bo Zhang,
Jiajie Yuan,
Haozhang Deng
Abstract:
This project intends to study a cardiovascular disease risk early warning model based on one-dimensional convolutional neural networks. First, the missing values of 13 physiological and symptom indicators such as patient age, blood glucose, cholesterol, and chest pain were filled and Z-score was standardized. The convolutional neural network is converted into a 2D matrix, the convolution function…
▽ More
This project intends to study a cardiovascular disease risk early warning model based on one-dimensional convolutional neural networks. First, the missing values of 13 physiological and symptom indicators such as patient age, blood glucose, cholesterol, and chest pain were filled and Z-score was standardized. The convolutional neural network is converted into a 2D matrix, the convolution function of 1,3, and 5 is used for the first-order convolution operation, and the Max Pooling algorithm is adopted for dimension reduction. Set the learning rate and output rate. It is optimized by the Adam algorithm. The result of classification is output by a soft classifier. This study was conducted based on Statlog in the UCI database and heart disease database respectively. The empirical data indicate that the forecasting precision of this technique has been enhanced by 11.2%, relative to conventional approaches, while there is a significant improvement in the logarithmic curve fitting. The efficacy and applicability of the novel approach are corroborated through the examination employing a one-dimensional convolutional neural network.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Improving Adversarial Robustness via Feature Pattern Consistency Constraint
Authors:
Jiacong Hu,
**gwen Ye,
Zunlei Feng,
Jiazhen Yang,
Shunyu Liu,
Xiaotian Yu,
Lingxiang Jia,
Mingli Song
Abstract:
Convolutional Neural Networks (CNNs) are well-known for their vulnerability to adversarial attacks, posing significant security concerns. In response to these threats, various defense methods have emerged to bolster the model's robustness. However, most existing methods either focus on learning from adversarial perturbations, leading to overfitting to the adversarial examples, or aim to eliminate…
▽ More
Convolutional Neural Networks (CNNs) are well-known for their vulnerability to adversarial attacks, posing significant security concerns. In response to these threats, various defense methods have emerged to bolster the model's robustness. However, most existing methods either focus on learning from adversarial perturbations, leading to overfitting to the adversarial examples, or aim to eliminate such perturbations during inference, inevitably increasing computational burdens. Conversely, clean training, which strengthens the model's robustness by relying solely on clean examples, can address the aforementioned issues. In this paper, we align with this methodological stream and enhance its generalizability to unknown adversarial examples. This enhancement is achieved by scrutinizing the behavior of latent features within the network. Recognizing that a correct prediction relies on the correctness of the latent feature's pattern, we introduce a novel and effective Feature Pattern Consistency Constraint (FPCC) method to reinforce the latent feature's capacity to maintain the correct feature pattern. Specifically, we propose Spatial-wise Feature Modification and Channel-wise Feature Selection to enhance latent features. Subsequently, we employ the Pattern Consistency Loss to constrain the similarity between the feature pattern of the latent features and the correct feature pattern. Our experiments demonstrate that the FPCC method empowers latent features to uphold correct feature patterns even in the face of adversarial examples, resulting in inherent adversarial robustness surpassing state-of-the-art models.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Nonconvex Federated Learning on Compact Smooth Submanifolds With Heterogeneous Data
Authors:
Jiaojiao Zhang,
Jiang Hu,
Anthony Man-Cho So,
Mikael Johansson
Abstract:
Many machine learning tasks, such as principal component analysis and low-rank matrix completion, give rise to manifold optimization problems. Although there is a large body of work studying the design and analysis of algorithms for manifold optimization in the centralized setting, there are currently very few works addressing the federated setting. In this paper, we consider nonconvex federated l…
▽ More
Many machine learning tasks, such as principal component analysis and low-rank matrix completion, give rise to manifold optimization problems. Although there is a large body of work studying the design and analysis of algorithms for manifold optimization in the centralized setting, there are currently very few works addressing the federated setting. In this paper, we consider nonconvex federated learning over a compact smooth submanifold in the setting of heterogeneous client data. We propose an algorithm that leverages stochastic Riemannian gradients and a manifold projection operator to improve computational efficiency, uses local updates to improve communication efficiency, and avoids client drift. Theoretically, we show that our proposed algorithm converges sub-linearly to a neighborhood of a first-order optimal solution by using a novel analysis that jointly exploits the manifold structure and properties of the loss functions. Numerical experiments demonstrate that our algorithm has significantly smaller computational and communication overhead than existing methods.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Observation of $η_{c}$(1S, 2S) and $χ_{cJ}$ decays to 2$(π^{+}π^{-})η$ via $ψ$(3686) radiative transitions
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (636 additional authors not shown)
Abstract:
Based on $2.7 \times 10^9~ψ(3686)$ decays collected with the BESIII detector, the radiative decay $ψ(3686)\to\gamma2(π^{+}π^{-})η$ is investigated to measure properties of S- and P-wave charmonium states. The branching fraction of the decay $η_{c}(1S) \to 2(π^{+}π^{-})η$, which is found to have a strong dependence on the interference pattern between $η_c(1S)$ and non-$η_c(1S)$ processes, is measur…
▽ More
Based on $2.7 \times 10^9~ψ(3686)$ decays collected with the BESIII detector, the radiative decay $ψ(3686)\to\gamma2(π^{+}π^{-})η$ is investigated to measure properties of S- and P-wave charmonium states. The branching fraction of the decay $η_{c}(1S) \to 2(π^{+}π^{-})η$, which is found to have a strong dependence on the interference pattern between $η_c(1S)$ and non-$η_c(1S)$ processes, is measured in both destructive and constructive interference scenarios for the first time. The mass and width of the $η_{c}(1S)$ are measured to be $M=(2984.14 \pm 0.13 \pm 0.38)$ MeV/$c^{2}$ and $Γ=(28.82 \pm 0.11 \pm 0.82)$ MeV, respectively. Clear signals for the decays of the $χ_{cJ}(J=0,1,2)$ and the $η_{c}(2S)$ to $2(π^{+}π^{-})η$ are also observed for the first time, and the corresponding branching fractions are measured. The ratio of the branching fractions between the $η_{c}(2S)$ and $η_{c}(1S)$ decays is significantly lower than the theoretical prediction, which might suggest different dynamics in their decays.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
AI.vs.Clinician: Unveiling Intricate Interactions Between AI and Clinicians through an Open-Access Database
Authors:
Wanling Gao,
Yuan Liu,
Zhuoming Yu,
Dandan Cui,
Wen**g Liu,
Xiaoshuang Liang,
Jiahui Zhao,
Jiyue Xie,
Hao Li,
Li Ma,
Ning Ye,
Yumiao Kang,
Dingfeng Luo,
Peng Pan,
Wei Huang,
Zhongmou Liu,
Jizhong Hu,
Fan Huang,
Gangyuan Zhao,
Chongrong Jiang,
Tianyi Wei,
Zhifei Zhang,
Yunyou Huang,
Jianfeng Zhan
Abstract:
Artificial Intelligence (AI) plays a crucial role in medical field and has the potential to revolutionize healthcare practices. However, the success of AI models and their impacts hinge on the synergy between AI and medical specialists, with clinicians assuming a dominant role. Unfortunately, the intricate dynamics and interactions between AI and clinicians remain undiscovered and thus hinder AI f…
▽ More
Artificial Intelligence (AI) plays a crucial role in medical field and has the potential to revolutionize healthcare practices. However, the success of AI models and their impacts hinge on the synergy between AI and medical specialists, with clinicians assuming a dominant role. Unfortunately, the intricate dynamics and interactions between AI and clinicians remain undiscovered and thus hinder AI from being translated into medical practice. To address this gap, we have curated a groundbreaking database called AI.vs.Clinician. This database is the first of its kind for studying the interactions between AI and clinicians. It derives from 7,500 collaborative diagnosis records on a life-threatening medical emergency -- Sepsis -- from 14 medical centers across China. For the patient cohorts well-chosen from MIMIC databases, the AI-related information comprises the model property, feature input, diagnosis decision, and inferred probabilities of sepsis onset presently and within next three hours. The clinician-related information includes the viewed examination data and sequence, viewed time, preliminary and final diagnosis decisions with or without AI assistance, and recommended treatment.
△ Less
Submitted 15 June, 2024; v1 submitted 11 June, 2024;
originally announced June 2024.
-
MIPI 2024 Challenge on Few-shot RAW Image Denoising: Methods and Results
Authors:
Xin **,
Chunle Guo,
Xiaoming Li,
Zongsheng Yue,
Chongyi Li,
Shangchen Zhou,
Ruicheng Feng,
Yuekun Dai,
Peiqing Yang,
Chen Change Loy,
Ruoqi Li,
Chang Liu,
Ziyi Wang,
Yao Du,
**g**g Yang,
Long Bao,
Heng Sun,
Xiangyu Kong,
Xiaoxia Xing,
**long Wu,
Yuanyang Xue,
Hyunhee Park,
Sejun Song,
Changho Kim,
**gfan Tan
, et al. (17 additional authors not shown)
Abstract:
The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra…
▽ More
The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging (MIPI). Building on the achievements of the previous MIPI Workshops held at ECCV 2022 and CVPR 2023, we introduce our third MIPI challenge including three tracks focusing on novel image sensors and imaging algorithms. In this paper, we summarize and review the Few-shot RAW Image Denoising track on MIPI 2024. In total, 165 participants were successfully registered, and 7 teams submitted results in the final testing phase. The developed solutions in this challenge achieved state-of-the-art erformance on Few-shot RAW Image Denoising. More details of this challenge and the link to the dataset can be found at https://mipichallenge.org/MIPI2024.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Ensemble Model With Bert,Roberta and Xlnet For Molecular property prediction
Authors:
Junling Hu
Abstract:
This paper presents a novel approach for predicting molecular properties with high accuracy without the need for extensive pre-training. Employing ensemble learning and supervised fine-tuning of BERT, RoBERTa, and XLNet, our method demonstrates significant effectiveness compared to existing advanced models. Crucially, it addresses the issue of limited computational resources faced by experimental…
▽ More
This paper presents a novel approach for predicting molecular properties with high accuracy without the need for extensive pre-training. Employing ensemble learning and supervised fine-tuning of BERT, RoBERTa, and XLNet, our method demonstrates significant effectiveness compared to existing advanced models. Crucially, it addresses the issue of limited computational resources faced by experimental groups, enabling them to accurately predict molecular properties. This innovation provides a cost-effective and resource-efficient solution, potentially advancing further research in the molecular domain.
△ Less
Submitted 30 May, 2024;
originally announced June 2024.
-
Strong and weak $CP$ tests in sequential decays of polarized $Σ^0$ hyperons
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (644 additional authors not shown)
Abstract:
The $J/ψ, ψ(3686) \to Σ^0 \barΣ^{0}$ processes and subsequent decays are studied using the world's largest $J/ψ$ and $ψ(3686)$ data samples collected with the BESIII detector. The strong-$CP$ symmetry is tested in the decays of the $Σ^0$ hyperons for the first time by measuring the decay parameters, $α_{Σ^0} = -0.0017 \pm 0.0021 \pm 0.0018$ and $\barα_{Σ^0} = 0.0021 \pm 0.0020 \pm 0.0022$. The wea…
▽ More
The $J/ψ, ψ(3686) \to Σ^0 \barΣ^{0}$ processes and subsequent decays are studied using the world's largest $J/ψ$ and $ψ(3686)$ data samples collected with the BESIII detector. The strong-$CP$ symmetry is tested in the decays of the $Σ^0$ hyperons for the first time by measuring the decay parameters, $α_{Σ^0} = -0.0017 \pm 0.0021 \pm 0.0018$ and $\barα_{Σ^0} = 0.0021 \pm 0.0020 \pm 0.0022$. The weak-$CP$ test is performed in the subsequent decays of their daughter particles $Λ$ and $\barΛ$. Also for the first time, the transverse polarizations of the $Σ^0$ hyperons in $J/ψ$ and $ψ(3686)$ decays are observed with opposite directions, and the ratios between the S-wave and D-wave contributions of the $J/ψ, ψ(3686) \to Σ^0 \barΣ^{0}$ decays are obtained. These results are crucial to understand the decay dynamics of the charmonium states and the production mechanism of the $Σ^0-\barΣ^0$ pairs.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Measurement of the integrated luminosity of the data collected at 3.773 GeV by BESIII from 2021 to 2024
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (634 additional authors not shown)
Abstract:
We present a measurement of the integrated luminosity of $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at a center-of-mass energy of $E_{\rm cm} = 3.773$~GeV. The integrated luminosities of the data sets taken from December 2021 to June 2022, from November 2022 to June 2023, and from October 2023 to February 2024 are determined to be $4.995 \pm 0.019$~fb$^{-1}$,…
▽ More
We present a measurement of the integrated luminosity of $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at a center-of-mass energy of $E_{\rm cm} = 3.773$~GeV. The integrated luminosities of the data sets taken from December 2021 to June 2022, from November 2022 to June 2023, and from October 2023 to February 2024 are determined to be $4.995 \pm 0.019$~fb$^{-1}$, $8.157 \pm 0.031$~fb$^{-1}$, and $4.191 \pm 0.016$~fb$^{-1}$, respectively, by analyzing large angle Bhabha scattering events. The uncertainties are dominated by systematic effects and the statistical uncertainties are negligible. Our results provide essential input for future analyses and precision measurements.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
Methodology and Real-World Applications of Dynamic Uncertain Causality Graph for Clinical Diagnosis with Explainability and Invariance
Authors:
Zhan Zhang,
Qin Zhang,
Yang Jiao,
Lin Lu,
Lin Ma,
Aihua Liu,
Xiao Liu,
Juan Zhao,
Yajun Xue,
Bing Wei,
Mingxia Zhang,
Ru Gao,
Hong Zhao,
Jie Lu,
Fan Li,
Yang Zhang,
Yiming Wang,
Lei Zhang,
Fengwei Tian,
Jie Hu,
Xin Gou
Abstract:
AI-aided clinical diagnosis is desired in medical care. Existing deep learning models lack explainability and mainly focus on image analysis. The recently developed Dynamic Uncertain Causality Graph (DUCG) approach is causality-driven, explainable, and invariant across different application scenarios, without problems of data collection, labeling, fitting, privacy, bias, generalization, high cost…
▽ More
AI-aided clinical diagnosis is desired in medical care. Existing deep learning models lack explainability and mainly focus on image analysis. The recently developed Dynamic Uncertain Causality Graph (DUCG) approach is causality-driven, explainable, and invariant across different application scenarios, without problems of data collection, labeling, fitting, privacy, bias, generalization, high cost and high energy consumption. Through close collaboration between clinical experts and DUCG technicians, 46 DUCG models covering 54 chief complaints were constructed. Over 1,000 diseases can be diagnosed without triage. Before being applied in real-world, the 46 DUCG models were retrospectively verified by third-party hospitals. The verified diagnostic precisions were no less than 95%, in which the diagnostic precision for every disease including uncommon ones was no less than 80%. After verifications, the 46 DUCG models were applied in the real-world in China. Over one million real diagnosis cases have been performed, with only 17 incorrect diagnoses identified. Due to DUCG's transparency, the mistakes causing the incorrect diagnoses were found and corrected. The diagnostic abilities of the clinicians who applied DUCG frequently were improved significantly. Following the introduction to the earlier presented DUCG methodology, the recommendation algorithm for potential medical checks is presented and the key idea of DUCG is extracted.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
QGEval: A Benchmark for Question Generation Evaluation
Authors:
Wei** Fu,
Bifan Wei,
Jianxiang Hu,
Zhongmin Cai,
Jun Liu
Abstract:
Automatically generated questions often suffer from problems such as unclear expression or factual inaccuracies, requiring a reliable and comprehensive evaluation of their quality. Human evaluation is frequently used in the field of question generation (QG) and is one of the most accurate evaluation methods. It also serves as the standard for automatic metrics. However, there is a lack of unified…
▽ More
Automatically generated questions often suffer from problems such as unclear expression or factual inaccuracies, requiring a reliable and comprehensive evaluation of their quality. Human evaluation is frequently used in the field of question generation (QG) and is one of the most accurate evaluation methods. It also serves as the standard for automatic metrics. However, there is a lack of unified evaluation criteria, which hampers the development of both QG technologies and automatic evaluation methods. To address this, we propose QGEval, a multi-dimensional Evaluation benchmark for Question Generation, which evaluates both generated questions and existing automatic metrics across 7 dimensions: fluency, clarity, conciseness, relevance, consistency, answerability, and answer consistency. We demonstrate the appropriateness of these dimensions by examining their correlations and distinctions. Analysis with QGEval reveals that 1) most QG models perform unsatisfactorily in terms of answerability and answer consistency, and 2) existing metrics fail to align well with human assessments when evaluating generated questions across the 7 dimensions. We expect this work to foster the development of both QG technologies and automatic metrics for QG.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
Efficient Beamforming Feedback Information-Based Wi-Fi Sensing by Feature Selection
Authors:
Xin Li,
**gzhi Hu,
Jun Luo
Abstract:
Wi-Fi sensing leveraging plain-text beamforming feedback information (BFI) in multiple-input-multiple-output (MIMO) systems attracts increasing attention. However, due to the implicit relationship between BFI and the channel state information (CSI), quantifying the sensing capability of BFI poses a challenge in building efficient BFI-based sensing algorithms. In this letter, we first derive a math…
▽ More
Wi-Fi sensing leveraging plain-text beamforming feedback information (BFI) in multiple-input-multiple-output (MIMO) systems attracts increasing attention. However, due to the implicit relationship between BFI and the channel state information (CSI), quantifying the sensing capability of BFI poses a challenge in building efficient BFI-based sensing algorithms. In this letter, we first derive a mathematical model of BFI, characterizing its relationship with CSI explicitly, and then develop a closed-form expression of BFI for 2x2 MIMO systems. To enhance the efficiency of BFI-based sensing by selecting only the most informative features, we quantify the sensing capacity of BFI using the Cramer-Rao bound (CRB) and then propose an efficient CRB-based BFI feature selection algorithm. Simulation results verify that BFI and CSI exhibit comparable sensing capabilities and that the proposed algorithm halves the number of features, reducing 20% more parameters than baseline methods, at the cost of only slightly increasing positioning errors.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis
Authors:
Zanlin Ni,
Yulin Wang,
Ren** Zhou,
Jiayi Guo,
**yi Hu,
Zhiyuan Liu,
Shiji Song,
Yuan Yao,
Gao Huang
Abstract:
The field of image synthesis is currently flourishing due to the advancements in diffusion models. While diffusion models have been successful, their computational intensity has prompted the pursuit of more efficient alternatives. As a representative work, non-autoregressive Transformers (NATs) have been recognized for their rapid generation. However, a major drawback of these models is their infe…
▽ More
The field of image synthesis is currently flourishing due to the advancements in diffusion models. While diffusion models have been successful, their computational intensity has prompted the pursuit of more efficient alternatives. As a representative work, non-autoregressive Transformers (NATs) have been recognized for their rapid generation. However, a major drawback of these models is their inferior performance compared to diffusion models. In this paper, we aim to re-evaluate the full potential of NATs by revisiting the design of their training and inference strategies. Specifically, we identify the complexities in properly configuring these strategies and indicate the possible sub-optimality in existing heuristic-driven designs. Recognizing this, we propose to go beyond existing methods by directly solving the optimal strategies in an automatic framework. The resulting method, named AutoNAT, advances the performance boundaries of NATs notably, and is able to perform comparably with the latest diffusion models at a significantly reduced inference cost. The effectiveness of AutoNAT is validated on four benchmark datasets, i.e., ImageNet-256 & 512, MS-COCO, and CC3M. Our code is available at https://github.com/LeapLabTHU/ImprovedNAT.
△ Less
Submitted 8 June, 2024;
originally announced June 2024.
-
End-to-End Design of Polar Coded Integrated Data and Energy Networking
Authors:
Jie Hu,
**gwen Cui,
Kun Yang
Abstract:
In order to transmit data and transfer energy to the low-power Internet of Things (IoT) devices, integrated data and energy networking (IDEN) system may be harnessed. In this context, we propose a bitwise end-to-end design for polar coded IDEN systems, where the conventional encoding/decoding, modulation/demodulation, and energy harvesting (EH) modules are replaced by the neural networks (NNs). In…
▽ More
In order to transmit data and transfer energy to the low-power Internet of Things (IoT) devices, integrated data and energy networking (IDEN) system may be harnessed. In this context, we propose a bitwise end-to-end design for polar coded IDEN systems, where the conventional encoding/decoding, modulation/demodulation, and energy harvesting (EH) modules are replaced by the neural networks (NNs). In this way, the entire system can be treated as an AutoEncoder (AE) and trained in an end-to-end manner. Hence achieving global optimization. Additionally, we improve the common NN-based belief propagation (BP) decoder by adding an extra hypernetwork, which generates the corresponding NN weights for the main network under different number of iterations, thus the adaptability of the receiver architecture can be further enhanced. Our numerical results demonstrate that our BP-based end-to-end design is superior to conventional BP-based counterparts in terms of both the BER and power transfer, but it is inferior to the successive cancellation list (SCL)-based conventional IDEN system, which may be due to the inherent performance gap between the BP and SCL decoders.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
An Inexact Bregman Proximal Difference-of-Convex Algorithm with Two Types of Relative Stop** Criteria
Authors:
Lei Yang,
**g**g Hu,
Kim-Chuan Toh
Abstract:
In this paper, we consider a class of difference-of-convex (DC) optimization problems, where the global Lipschitz gradient continuity assumption on the smooth part of the objective function is not required. Such problems are prevalent in many contemporary applications such as compressed sensing, statistical regression, and machine learning, and can be solved by a general Bregman proximal DC algori…
▽ More
In this paper, we consider a class of difference-of-convex (DC) optimization problems, where the global Lipschitz gradient continuity assumption on the smooth part of the objective function is not required. Such problems are prevalent in many contemporary applications such as compressed sensing, statistical regression, and machine learning, and can be solved by a general Bregman proximal DC algorithm (BPDCA). However, the existing BPDCA is developed based on the stringent requirement that the involved subproblems must be solved exactly, which is often impractical and limits the applicability of the BPDCA. To facilitate the practical implementations and wider applications of the BPDCA, we develop an inexact Bregman proximal difference-of-convex algorithm (iBPDCA) by incorporating two types of relative-type stop** criteria for solving the subproblems. The proposed inexact framework has considerable flexibility to encompass many existing exact and inexact methods, and can accommodate different types of errors that may occur when solving the subproblem. This enables the potential application of our inexact framework across different DC decompositions to facilitate the design of a more efficient DCA scheme in practice. The global subsequential convergence and the global sequential convergence of our iBPDCA are established under suitable conditions including the Kurdyka-Łojasiewicz property. Some numerical experiments on the $\ell_{1-2}$ regularized least squares problem and the constrained $\ell_{1-2}$ sparse optimization problem are conducted to show the superior performance of our iBPDCA in comparison to existing algorithms. These results also empirically verify the necessity of develo** different types of stop** criteria to facilitate the efficient computation of the subproblem in each iteration of our iBPDCA.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
BitsFusion: 1.99 bits Weight Quantization of Diffusion Model
Authors:
Yang Sui,
Yanyu Li,
Anil Kag,
Yerlan Idelbayev,
Junli Cao,
Ju Hu,
Dhritiman Sagar,
Bo Yuan,
Sergey Tulyakov,
Jian Ren
Abstract:
Diffusion-based image generation models have achieved great success in recent years by showing the capability of synthesizing high-quality content. However, these models contain a huge number of parameters, resulting in a significantly large model size. Saving and transferring them is a major bottleneck for various applications, especially those running on resource-constrained devices. In this wor…
▽ More
Diffusion-based image generation models have achieved great success in recent years by showing the capability of synthesizing high-quality content. However, these models contain a huge number of parameters, resulting in a significantly large model size. Saving and transferring them is a major bottleneck for various applications, especially those running on resource-constrained devices. In this work, we develop a novel weight quantization method that quantizes the UNet from Stable Diffusion v1.5 to 1.99 bits, achieving a model with 7.9X smaller size while exhibiting even better generation quality than the original one. Our approach includes several novel techniques, such as assigning optimal bits to each layer, initializing the quantized model for better performance, and improving the training strategy to dramatically reduce quantization error. Furthermore, we extensively evaluate our quantized model across various benchmark datasets and through human evaluation to demonstrate its superior generation quality.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
On Limitation of Transformer for Learning HMMs
Authors:
Jiachen Hu,
Qinghua Liu,
Chi **
Abstract:
Despite the remarkable success of Transformer-based architectures in various sequential modeling tasks, such as natural language processing, computer vision, and robotics, their ability to learn basic sequential models, like Hidden Markov Models (HMMs), is still unclear. This paper investigates the performance of Transformers in learning HMMs and their variants through extensive experimentation an…
▽ More
Despite the remarkable success of Transformer-based architectures in various sequential modeling tasks, such as natural language processing, computer vision, and robotics, their ability to learn basic sequential models, like Hidden Markov Models (HMMs), is still unclear. This paper investigates the performance of Transformers in learning HMMs and their variants through extensive experimentation and compares them to Recurrent Neural Networks (RNNs). We show that Transformers consistently underperform RNNs in both training speed and testing accuracy across all tested HMM models. There are even challenging HMM instances where Transformers struggle to learn, while RNNs can successfully do so. Our experiments further reveal the relation between the depth of Transformers and the longest sequence length it can effectively learn, based on the types and the complexity of HMMs. To address the limitation of transformers in modeling HMMs, we demonstrate that a variant of the Chain-of-Thought (CoT), called $\textit{block CoT}$ in the training phase, can help transformers to reduce the evaluation error and to learn longer sequences at a cost of increasing the training time. Finally, we complement our empirical findings by theoretical results proving the expressiveness of transformers in approximating HMMs with logarithmic depth.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
TwinS: Revisiting Non-Stationarity in Multivariate Time Series Forecasting
Authors:
Jiaxi Hu,
Qingsong Wen,
Sijie Ruan,
Li Liu,
Yuxuan Liang
Abstract:
Recently, multivariate time series forecasting tasks have garnered increasing attention due to their significant practical applications, leading to the emergence of various deep forecasting models. However, real-world time series exhibit pronounced non-stationary distribution characteristics. These characteristics are not solely limited to time-varying statistical properties highlighted by non-sta…
▽ More
Recently, multivariate time series forecasting tasks have garnered increasing attention due to their significant practical applications, leading to the emergence of various deep forecasting models. However, real-world time series exhibit pronounced non-stationary distribution characteristics. These characteristics are not solely limited to time-varying statistical properties highlighted by non-stationary Transformer but also encompass three key aspects: nested periodicity, absence of periodic distributions, and hysteresis among time variables. In this paper, we begin by validating this theory through wavelet analysis and propose the Transformer-based TwinS model, which consists of three modules to address the non-stationary periodic distributions: Wavelet Convolution, Period-Aware Attention, and Channel-Temporal Mixed MLP. Specifically, The Wavelet Convolution models nested periods by scaling the convolution kernel size like wavelet transform. The Period-Aware Attention guides attention computation by generating period relevance scores through a convolutional sub-network. The Channel-Temporal Mixed MLP captures the overall relationships between time series through channel-time mixing learning. TwinS achieves SOTA performance compared to mainstream TS models, with a maximum improvement in MSE of 25.8\% over PatchTST.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Measurement of the branching fraction ratios $R(D^{+})$ and $R(D^{*+})$ using muonic $τ$ decays
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1063 additional authors not shown)
Abstract:
The branching fraction ratios of $\overline{B}^0\to D^+τ^-\overlineν_τ$ and $\overline{B}^0\to D^{*+}τ^-\overlineν_τ$ decays are measured with respect to their muonic counterparts, using a data sample corresponding to an integrated luminosity of 2.0 fb$^{-1}$ collected by the LHCb experiment in proton-proton collisions at $\sqrt{s} = 13$ TeV. The reconstructed final states are formed by combining…
▽ More
The branching fraction ratios of $\overline{B}^0\to D^+τ^-\overlineν_τ$ and $\overline{B}^0\to D^{*+}τ^-\overlineν_τ$ decays are measured with respect to their muonic counterparts, using a data sample corresponding to an integrated luminosity of 2.0 fb$^{-1}$ collected by the LHCb experiment in proton-proton collisions at $\sqrt{s} = 13$ TeV. The reconstructed final states are formed by combining $D^+$ mesons with $τ^-\toμ^-\overlineν_μν_τ$ candidates, where the $D^+$ is reconstructed via the $D^+\to K^-π^+π^+$ decay. The results are
\begin{align*}
R(D^{+}) &= 0.249 \pm 0.043 \pm 0.047,
R(D^{*+}) &= 0.402 \pm 0.081\pm 0.085,
\end{align*}
where the first uncertainties are statistical and the second systematic. The two measurements have a correlation coefficient of $-0.39$ and are compatible with the Standard Model.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Observation of new charmonium(-like) states in $B^+ \to D^{*\pm} D^{\mp} K^+$ decays
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1062 additional authors not shown)
Abstract:
A study of resonant structures in $B^{+}\rightarrow{D^{\ast+}D^{-}K^{+}}$ and $B^{+}\rightarrow{D^{\ast-}D^{+}K^{+}}$ decays is performed, using proton-proton collision data at centre-of-mass energies of $\sqrt{s}=7, 8$, and $13$ TeV recorded by the LHCb experiment, corresponding to an integrated luminosity of 9 fb$^{-1}$. A simultaneous amplitude fit is performed to the two channels with contribu…
▽ More
A study of resonant structures in $B^{+}\rightarrow{D^{\ast+}D^{-}K^{+}}$ and $B^{+}\rightarrow{D^{\ast-}D^{+}K^{+}}$ decays is performed, using proton-proton collision data at centre-of-mass energies of $\sqrt{s}=7, 8$, and $13$ TeV recorded by the LHCb experiment, corresponding to an integrated luminosity of 9 fb$^{-1}$. A simultaneous amplitude fit is performed to the two channels with contributions from resonances decaying to $D^{\ast-}D^{+}$ and $D^{\ast+}D^{-}$ states linked by $C$ parity. This procedure allows the $C$-parities of resonances in the $D^{\ast\pm}D^{\mp}$ mass spectra to be determined. Four charmonium(-like) states are observed decaying into $D^{\ast\pm}D^{\mp}$: $η_c(3945)$, $h_c(4000)$, $χ_{c1}(4010)$ and $h_c(4300)$, with quantum numbers $J^{PC}$ equal to $0^{-+}$, $1^{+-}$, $1^{++}$ and $1^{+-}$, respectively. At least three of these states have not been observed previously. In addition, the existence of the $T_{\bar{c}\bar{s}0}^{*}(2870)^{0}$ and $T_{\bar{c}\bar{s}1}^{*}(2900)^{0}$ resonances in the $D^-K^+$ mass spectrum, already observed in the $B^+ \to D^+ D^- K^+$ decay, is confirmed in a different production channel.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
A Combination Model for Time Series Prediction using LSTM via Extracting Dynamic Features Based on Spatial Smoothing and Sequential General Variational Mode Decomposition
Authors:
Jianyu Liu,
Wei Chen,
Yong Zhang,
Zhenfeng Chen,
Bin Wan,
**wei Hu
Abstract:
In order to solve the problems such as difficult to extract effective features and low accuracy of sales volume prediction caused by complex relationships such as market sales volume in time series prediction, we proposed a time series prediction method of market sales volume based on Sequential General VMD and spatial smoothing Long short-term memory neural network (SS-LSTM) combination model. Fi…
▽ More
In order to solve the problems such as difficult to extract effective features and low accuracy of sales volume prediction caused by complex relationships such as market sales volume in time series prediction, we proposed a time series prediction method of market sales volume based on Sequential General VMD and spatial smoothing Long short-term memory neural network (SS-LSTM) combination model. Firstly, the spatial smoothing algorithm is used to decompose and calculate the sample data of related industry sectors affected by the linkage effect of market sectors, extracting modal features containing information via Sequential General VMD on overall market and specific price trends; Then, according to the background of different Market data sets, LSTM network is used to model and predict the price of fundamental data and modal characteristics. The experimental results of data prediction with seasonal and periodic trends show that this method can achieve higher price prediction accuracy and more accurate accuracy in specific market contexts compared to traditional prediction methods Describe the changes in market sales volume.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Computational Limits of Low-Rank Adaptation (LoRA) for Transformer-Based Models
Authors:
Jerry Yao-Chieh Hu,
Maojiang Su,
En-Jui Kuo,
Zhao Song,
Han Liu
Abstract:
We study the computational limits of Low-Rank Adaptation (LoRA) update for finetuning transformer-based models using fine-grained complexity theory. Our key observation is that the existence of low-rank decompositions within the gradient computation of LoRA adaptation leads to possible algorithmic speedup. This allows us to (i) identify a phase transition behavior and (ii) prove the existence of n…
▽ More
We study the computational limits of Low-Rank Adaptation (LoRA) update for finetuning transformer-based models using fine-grained complexity theory. Our key observation is that the existence of low-rank decompositions within the gradient computation of LoRA adaptation leads to possible algorithmic speedup. This allows us to (i) identify a phase transition behavior and (ii) prove the existence of nearly linear algorithms by controlling the LoRA update computation term by term, assuming the Strong Exponential Time Hypothesis (SETH). For the former, we identify a sharp transition in the efficiency of all possible rank-$r$ LoRA update algorithms for transformers, based on specific norms resulting from the multiplications of the input sequence $\mathbf{X}$, pretrained weights $\mathbf{W^\star}$, and adapter matrices $α\mathbf{B} \mathbf{A} / r$. Specifically, we derive a shared upper bound threshold for such norms and show that efficient (sub-quadratic) approximation algorithms of LoRA exist only below this threshold. For the latter, we prove the existence of nearly linear approximation algorithms for LoRA adaptation by utilizing the hierarchical low-rank structures of LoRA gradients and approximating the gradients with a series of chained low-rank approximations. To showcase our theory, we consider two practical scenarios: partial (e.g., only $\mathbf{W}_V$ and $\mathbf{W}_Q$) and full adaptations (e.g., $\mathbf{W}_Q$, $\mathbf{W}_V$, and $\mathbf{W}_K$) of weights in attention heads.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Measurements of the branching fractions of the $P$-wave charmonium spin-singlet state $h_c(^1P_1) \to h^+ h^-π^0/η$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (643 additional authors not shown)
Abstract:
Based on $(2712.4\pm 14.3)\times10^{6}$ $ψ(3686)$ events, we investigate four hadronic decay modes of the $P$-wave charmonium spin-singlet state $h_c(^1P_1) \to h^+ h^- π^0/η$ ($h=π$ or $K$) via the process $ψ(3686) \to π^{0}h_c$ at BESIII. The $h_c \to π^+ π^- π^0$ decay is observed with a significance of 9.6$σ$ after taking into account systematic uncertainties. Evidences for…
▽ More
Based on $(2712.4\pm 14.3)\times10^{6}$ $ψ(3686)$ events, we investigate four hadronic decay modes of the $P$-wave charmonium spin-singlet state $h_c(^1P_1) \to h^+ h^- π^0/η$ ($h=π$ or $K$) via the process $ψ(3686) \to π^{0}h_c$ at BESIII. The $h_c \to π^+ π^- π^0$ decay is observed with a significance of 9.6$σ$ after taking into account systematic uncertainties. Evidences for $h_c \to K^+ K^- π^0$ and $h_c \to K^+ K^- η$ are found with significances of $3.5σ$ and $3.3σ$, respectively, after considering the systematic uncertainties. The branching fractions of these decays are measured to be $\mathcal{B}(h_c \to π^+ π^- π^0)=(1.36\pm0.16\pm0.14)\times10^{-3}$, $\mathcal{B}(h_c \to K^+ K^- π^0)=(3.26\pm0.84\pm0.36)\times10^{-4}$, and $\mathcal{B}(h_c \to K^+ K^- η)=(3.13\pm1.08\pm0.38)\times10^{-4}$, where the first uncertainties are statistical and the second are systematic. No significant signal of $h_c\toπ^+π^-η$ is found, and the upper limit of its decay branching fraction is determined to be $\mathcal{B}(h_c\toπ^+π^-η) < 4.0 \times 10^{-4}$ at 90% confidence level.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Dynamically Expanding Capacity of Autonomous Driving with Near-Miss Focused Training Framework
Authors:
Ziyuan Yang,
Zhaoyang Li,
Jianming Hu,
Yi Zhang
Abstract:
The long-tail distribution of real driving data poses challenges for training and testing autonomous vehicles (AV), where rare yet crucial safety-critical scenarios are infrequent. And virtual simulation offers a low-cost and efficient solution. This paper proposes a near-miss focused training framework for AV. Utilizing the driving scenario information provided by sensors in the simulator, we des…
▽ More
The long-tail distribution of real driving data poses challenges for training and testing autonomous vehicles (AV), where rare yet crucial safety-critical scenarios are infrequent. And virtual simulation offers a low-cost and efficient solution. This paper proposes a near-miss focused training framework for AV. Utilizing the driving scenario information provided by sensors in the simulator, we design novel reward functions, which enable background vehicles (BV) to generate near-miss scenarios and ensure gradients exist not only in collision-free scenes but also in collision scenarios. And then leveraging the Robust Adversarial Reinforcement Learning (RARL) framework for simultaneous training of AV and BV to gradually enhance AV and BV capabilities, as well as generating near-miss scenarios tailored to different levels of AV capabilities. Results from three testing strategies indicate that the proposed method generates scenarios closer to near-miss, thus enhancing the capabilities of both AVs and BVs throughout training.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Safeguarding Large Language Models: A Survey
Authors:
Yi Dong,
Ronghui Mu,
Yanghao Zhang,
Siqi Sun,
Tianle Zhang,
Changshun Wu,
Gaojie **,
Yi Qi,
**wei Hu,
Jie Meng,
Saddek Bensalem,
Xiaowei Huang
Abstract:
In the burgeoning field of Large Language Models (LLMs), develo** a robust safety mechanism, colloquially known as "safeguards" or "guardrails", has become imperative to ensure the ethical use of LLMs within prescribed boundaries. This article provides a systematic literature review on the current status of this critical mechanism. It discusses its major challenges and how it can be enhanced int…
▽ More
In the burgeoning field of Large Language Models (LLMs), develo** a robust safety mechanism, colloquially known as "safeguards" or "guardrails", has become imperative to ensure the ethical use of LLMs within prescribed boundaries. This article provides a systematic literature review on the current status of this critical mechanism. It discusses its major challenges and how it can be enhanced into a comprehensive mechanism dealing with ethical issues in various contexts. First, the paper elucidates the current landscape of safeguarding mechanisms that major LLM service providers and the open-source community employ. This is followed by the techniques to evaluate, analyze, and enhance some (un)desirable properties that a guardrail might want to enforce, such as hallucinations, fairness, privacy, and so on. Based on them, we review techniques to circumvent these controls (i.e., attacks), to defend the attacks, and to reinforce the guardrails. While the techniques mentioned above represent the current status and the active research trends, we also discuss several challenges that cannot be easily dealt with by the methods and present our vision on how to implement a comprehensive guardrail through the full consideration of multi-disciplinary approach, neural-symbolic method, and systems development lifecycle.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Learning Image Priors through Patch-based Diffusion Models for Solving Inverse Problems
Authors:
Jason Hu,
Bowen Song,
Xiaojian Xu,
Liyue Shen,
Jeffrey A. Fessler
Abstract:
Diffusion models can learn strong image priors from underlying data distribution and use them to solve inverse problems, but the training process is computationally expensive and requires lots of data. Such bottlenecks prevent most existing works from being feasible for high-dimensional and high-resolution data such as 3D images. This paper proposes a method to learn an efficient data prior for th…
▽ More
Diffusion models can learn strong image priors from underlying data distribution and use them to solve inverse problems, but the training process is computationally expensive and requires lots of data. Such bottlenecks prevent most existing works from being feasible for high-dimensional and high-resolution data such as 3D images. This paper proposes a method to learn an efficient data prior for the entire image by training diffusion models only on patches of images. Specifically, we propose a patch-based position-aware diffusion inverse solver, called PaDIS, where we obtain the score function of the whole image through scores of patches and their positional encoding and utilize this as the prior for solving inverse problems. First of all, we show that this diffusion model achieves an improved memory efficiency and data efficiency while still maintaining the capability to generate entire images via positional encoding. Additionally, the proposed PaDIS model is highly flexible and can be plugged in with different diffusion inverse solvers (DIS). We demonstrate that the proposed PaDIS approach enables solving various inverse problems in both natural and medical image domains, including CT reconstruction, deblurring, and superresolution, given only patch-based priors. Notably, PaDIS outperforms previous DIS methods trained on entire image priors in the case of limited training data, demonstrating the data efficiency of our proposed approach by learning patch-based prior.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling
Authors:
Zefan Cai.,
Yichi Zhang,
Bofei Gao,
Yuliang Liu,
Tianyu Liu,
Keming Lu,
Wayne Xiong,
Yue Dong,
Baobao Chang,
Junjie Hu,
Wen Xiao
Abstract:
In this study, we investigate whether attention-based information flow inside large language models (LLMs) is aggregated through noticeable patterns for long context processing. Our observations reveal that LLMs aggregate information through Pyramidal Information Funneling where attention is scattering widely in lower layers, progressively consolidating within specific contexts, and ultimately foc…
▽ More
In this study, we investigate whether attention-based information flow inside large language models (LLMs) is aggregated through noticeable patterns for long context processing. Our observations reveal that LLMs aggregate information through Pyramidal Information Funneling where attention is scattering widely in lower layers, progressively consolidating within specific contexts, and ultimately focusin on critical tokens (a.k.a massive activation or attention sink) in higher layers. Motivated by these insights, we developed PyramidKV, a novel and effective KV cache compression method. This approach dynamically adjusts the KV cache size across different layers, allocating more cache in lower layers and less in higher ones, diverging from traditional methods that maintain a uniform KV cache size. Our experimental evaluations, utilizing the LongBench benchmark, show that PyramidKV matches the performance of models with a full KV cache while retaining only 12% of the KV cache, thus significantly reducing memory usage. In scenarios emphasizing memory efficiency, where only 0.7% of the KV cache is maintained, PyramidKV surpasses other KV cache compression techniques achieving up to a 20.5 absolute accuracy improvement on TREC.
△ Less
Submitted 16 June, 2024; v1 submitted 4 June, 2024;
originally announced June 2024.
-
Stochastic Carbon Footprint Tracing Methods in Power Systems
Authors:
Jiashuo Hu,
Xiao-** Zhang,
Youwei Jia
Abstract:
As the penetration of distributed energy resources (DER) and renewable energy sources (RES) increases, carbon footprint tracking requires more granular analysis results. Existing carbon footprint tracking methods focus on deterministic steady-state analysis where the high uncertainties of RES cannot be considered. Considering the deficiency of the existing deterministic method, this paper proposes…
▽ More
As the penetration of distributed energy resources (DER) and renewable energy sources (RES) increases, carbon footprint tracking requires more granular analysis results. Existing carbon footprint tracking methods focus on deterministic steady-state analysis where the high uncertainties of RES cannot be considered. Considering the deficiency of the existing deterministic method, this paper proposes two stochastic carbon footprint tracking methods to cope with the impact of RES uncertainty on load-side carbon footprint tracing. The first method introduces probabilistic analysis in the framework of carbon emissions flow (CEF) to provide a global reference for the spatial characteristic of the power system component carbon intensity distribution. Considering that the CEF network expands with the increasing penetration of DERs, the second method can effectively improve the computational efficiency over the first method while ensuring the computational accuracy on the large power systems. These proposed models are tested and compared in a synthetic 1004-bus test system in the case study to demonstrate the performance of the two proposed methods
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Uncorrelated estimations of $H_0$ redshift evolution from DESI baryon acoustic oscillation observations
Authors:
X. D. Jia,
J. P. Hu,
F. Y. Wang
Abstract:
The Dark Energy Spectroscopic Instrumnet (DESI) collaboration recently released the first year data of baryon acoustic oscillations (BAOs). Basing on the five different tracers, the cosmological constraint shows a hint of deviation from the standard $Λ$CDM model. In this letter, We combine the DESI BAOs with other cosmic probes to constrain the evolution of Hubble constant as a function of redshif…
▽ More
The Dark Energy Spectroscopic Instrumnet (DESI) collaboration recently released the first year data of baryon acoustic oscillations (BAOs). Basing on the five different tracers, the cosmological constraint shows a hint of deviation from the standard $Λ$CDM model. In this letter, We combine the DESI BAOs with other cosmic probes to constrain the evolution of Hubble constant as a function of redshift in flat $Λ$CDM model. The non-parametric method is used to estimate the value of Hubble constant at different redshift bins. The correlation among different bins are removed by diagonalizing the covariance matrix. The joint data sample demonstrate a decreasing trend of Hubble constant with a significance of $8.6 σ$, which can naturally resolve the Hubble tension. It may be due to dynamical dark energy or modified gravity.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
ProGEO: Generating Prompts through Image-Text Contrastive Learning for Visual Geo-localization
Authors:
Chen Mao,
**gqi Hu
Abstract:
Visual Geo-localization (VG) refers to the process to identify the location described in query images, which is widely applied in robotics field and computer vision tasks, such as autonomous driving, metaverse, augmented reality, and SLAM. In fine-grained images lacking specific text descriptions, directly applying pure visual methods to represent neighborhood features often leads to the model foc…
▽ More
Visual Geo-localization (VG) refers to the process to identify the location described in query images, which is widely applied in robotics field and computer vision tasks, such as autonomous driving, metaverse, augmented reality, and SLAM. In fine-grained images lacking specific text descriptions, directly applying pure visual methods to represent neighborhood features often leads to the model focusing on overly fine-grained features, unable to fully mine the semantic information in the images. Therefore, we propose a two-stage training method to enhance visual performance and use contrastive learning to mine challenging samples. We first leverage the multi-modal description capability of CLIP (Contrastive Language-Image Pretraining) to create a set of learnable text prompts for each geographic image feature to form vague descriptions. Then, by utilizing dynamic text prompts to assist the training of the image encoder, we enable the image encoder to learn better and more generalizable visual features. This strategy of applying text to purely visual tasks addresses the challenge of using multi-modal models for geographic images, which often suffer from a lack of precise descriptions, making them difficult to utilize widely. We validate the effectiveness of the proposed strategy on several large-scale visual geo-localization datasets, and our method achieves competitive results on multiple visual geo-localization datasets. Our code and model are available at https://github.com/Chain-Mao/ProGEO.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Decoupled Alignment for Robust Plug-and-Play Adaptation
Authors:
Haozheng Luo,
Jiahao Yu,
Wenxin Zhang,
Jialong Li,
Jerry Yao-Chieh Hu,
Xinyu Xing,
Han Liu
Abstract:
We introduce a low-resource safety enhancement method for aligning large language models (LLMs) without the need for supervised fine-tuning (SFT) or reinforcement learning from human feedback (RLHF). Our main idea is to exploit knowledge distillation to extract the alignment information from existing well-aligned LLMs and integrate it into unaligned LLMs in a plug-and-play fashion. Methodology, we…
▽ More
We introduce a low-resource safety enhancement method for aligning large language models (LLMs) without the need for supervised fine-tuning (SFT) or reinforcement learning from human feedback (RLHF). Our main idea is to exploit knowledge distillation to extract the alignment information from existing well-aligned LLMs and integrate it into unaligned LLMs in a plug-and-play fashion. Methodology, we employ delta debugging to identify the critical components of knowledge necessary for effective distillation. On the harmful question dataset, our method significantly enhances the average defense success rate by approximately 14.41%, reaching as high as 51.39%, in 17 unaligned pre-trained LLMs, without compromising performance.
△ Less
Submitted 6 June, 2024; v1 submitted 3 June, 2024;
originally announced June 2024.
-
Measurements of the branching fractions of semileptonic $D^{+}_s$ decays via $e^+e^-\to D_s^{*+}D_s^{*-}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (638 additional authors not shown)
Abstract:
We measure the absolute branching fractions of semileptonic $D^+_s$ decays via the $e^+e^-\to D_s^{*+}D_s^{*-}$ process using $e^+e^-$ collision data corresponding to an integrated luminosity of $10.64~\mathrm{fb}^{-1}$ collected by the BESIII detector at center-of-mass energies between 4.237 and 4.699 GeV. The branching fractions are…
▽ More
We measure the absolute branching fractions of semileptonic $D^+_s$ decays via the $e^+e^-\to D_s^{*+}D_s^{*-}$ process using $e^+e^-$ collision data corresponding to an integrated luminosity of $10.64~\mathrm{fb}^{-1}$ collected by the BESIII detector at center-of-mass energies between 4.237 and 4.699 GeV. The branching fractions are ${\mathcal B}(D_s^+\to ηe^+ν_e)=(2.35\pm0.11_{\rm stat}\pm 0.10_{\rm syst})\%,$ ${\mathcal
B}(D_s^+\to η^\prime e^+ν_e)=(0.82\pm0.09_{\rm stat}\pm 0.04_{\rm syst})\%,$ ${\mathcal B}(D_s^+\to φe^+ν_e)=(2.21\pm0.16_{\rm stat}\pm 0.11_{\rm syst})\%,$ ${\mathcal B}(D_s^+\to f_0(980) e^+ν_e,f_0(980)\toπ^+π^-)=(0.15\pm0.02_{\rm stat}\pm 0.01_{\rm syst})\%,$ ${\mathcal
B}(D_s^+\to K^0 e^+ν_e)=(0.24\pm0.04_{\rm stat}\pm 0.01_{\rm syst})\%,$ and ${\mathcal B}(D_s^+\to K^{*0} e^+ν_e)=(0.19\pm0.03_{\rm stat}\pm 0.01_{\rm syst})\%.$ These results are consistent with those measured via the $e^+e^-\to D_s^{*\pm}D_s^{\mp}$ process by BESIII and CLEO. The hadronic transition form factors $D^+_s\to ηe^+ν_e$, $D^+_s\to η^\prime e^+ν_e$, and $D^+_s\to K^0 e^+ν_e$ at four-momentum transfer squared $q^2$ = 0 are determined to be $f^η_+(0) = 0.482 \pm 0.011_{\rm stat} \pm 0.009_{\rm syst}\pm0.004_{\rm input},$ $f^{η^{\prime}}_+(0) = 0.562 \pm 0.031_{\rm stat} \pm 0.014_{\rm
syst}\pm0.003_{\rm input},$ and $f^{K^0}_+(0) = 0.624 \pm 0.052_{\rm
stat} \pm 0.013_{\rm syst}\pm0.002_{\rm input}.$
△ Less
Submitted 4 June, 2024; v1 submitted 3 June, 2024;
originally announced June 2024.
-
Measurement of Electron Antineutrino Oscillation Amplitude and Frequency via Neutron Capture on Hydrogen at Daya Bay
Authors:
Daya Bay collaboration,
F. P. An,
W. D. Bai,
A. B. Balantekin,
M. Bishai,
S. Blyth,
G. F. Cao,
J. Cao,
J. F. Chang,
Y. Chang,
H. S. Chen,
H. Y. Chen,
S. M. Chen,
Y. Chen,
Y. X. Chen,
Z. Y. Chen,
J. Cheng,
J. Cheng,
Y. -C. Cheng,
Z. K. Cheng,
J. J. Cherwinka,
M. C. Chu,
J. P. Cummings,
O. Dalager,
F. S. Deng
, et al. (177 additional authors not shown)
Abstract:
This Letter reports the first measurement of the oscillation amplitude and frequency of reactor antineutrinos at Daya Bay via neutron capture on hydrogen using 1958 days of data. With over 3.6 million signal candidates, an optimized candidate selection, improved treatment of backgrounds and efficiencies, refined energy calibration, and an energy response model for the capture-on-hydrogen sensitive…
▽ More
This Letter reports the first measurement of the oscillation amplitude and frequency of reactor antineutrinos at Daya Bay via neutron capture on hydrogen using 1958 days of data. With over 3.6 million signal candidates, an optimized candidate selection, improved treatment of backgrounds and efficiencies, refined energy calibration, and an energy response model for the capture-on-hydrogen sensitive region, the relative $\overlineν_{e}$ rates and energy spectra variation among the near and far detectors gives $\mathrm{sin}^22θ_{13} = 0.0759_{-0.0049}^{+0.0050}$ and $Δm^2_{32} = (2.72^{+0.14}_{-0.15})\times10^{-3}$ eV$^2$ assuming the normal neutrino mass ordering, and $Δm^2_{32} = (-2.83^{+0.15}_{-0.14})\times10^{-3}$ eV$^2$ for the inverted neutrino mass ordering. This estimate of $\sin^2 2θ_{13}$ is consistent with and essentially independent from the one obtained using the capture-on-gadolinium sample at Daya Bay. The combination of these two results yields $\mathrm{sin}^22θ_{13}= 0.0833\pm0.0022$, which represents an 8% relative improvement in precision regarding the Daya Bay full 3158-day capture-on-gadolinium result.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
OLIVE: Object Level In-Context Visual Embeddings
Authors:
Timothy Ossowski,
Junjie Hu
Abstract:
Recent generalist vision-language models (VLMs) have demonstrated impressive reasoning capabilities across diverse multimodal tasks. However, these models still struggle with fine-grained object-level understanding and grounding. In terms of modeling, existing VLMs implicitly align text tokens with image patch tokens, which is ineffective for embedding alignment at the same granularity and inevita…
▽ More
Recent generalist vision-language models (VLMs) have demonstrated impressive reasoning capabilities across diverse multimodal tasks. However, these models still struggle with fine-grained object-level understanding and grounding. In terms of modeling, existing VLMs implicitly align text tokens with image patch tokens, which is ineffective for embedding alignment at the same granularity and inevitably introduces noisy spurious background features. Additionally, these models struggle when generalizing to unseen visual concepts and may not be reliable for domain-specific tasks without further fine-tuning. To address these limitations, we propose a novel method to prompt large language models with in-context visual object vectors, thereby enabling controllable object-level reasoning. This eliminates the necessity of fusing a lengthy array of image patch features and significantly speeds up training. Furthermore, we propose region-level retrieval using our object representations, facilitating rapid adaptation to new objects without additional training. Our experiments reveal that our method achieves competitive referring object classification and captioning performance, while also offering zero-shot generalization and robustness to visually challenging contexts.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
Amplitude analysis of the radiative decay $B^0_s\to K^+K^-γ$
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1061 additional authors not shown)
Abstract:
A search for radiative decay of $B^0_s$ mesons to orbitally excited $K^+K^-$ states is performed using proton proton collisions recorded by the \mbox{LHCb}\xspace experiment, corresponding to an integrated luminosity of 9~fb$^{-1}$. The dikaon spectrum in the mass range $m_{KK}<2400$~{\ensuremath{\,\text{Me\kern -0.1em V\!/}c^2}\xspace} is dominated by the $φ(1020)$ resonance that accounts for alm…
▽ More
A search for radiative decay of $B^0_s$ mesons to orbitally excited $K^+K^-$ states is performed using proton proton collisions recorded by the \mbox{LHCb}\xspace experiment, corresponding to an integrated luminosity of 9~fb$^{-1}$. The dikaon spectrum in the mass range $m_{KK}<2400$~{\ensuremath{\,\text{Me\kern -0.1em V\!/}c^2}\xspace} is dominated by the $φ(1020)$ resonance that accounts for almost 70$\%$ of the decay rate. Considering the possible contributions of $f_2{(1270)}$, $f'_2{(1525)}$ and $f_2{(2010)}$ meson states, the overall tensor contribution to the amplitude is measured to be \begin{equation}
{\cal F}_{\{f_2\}}=16.8\pm 0.5\mathrm{~(stat.)}\pm0.7\mathrm{~(syst.)}\%,\nonumber \end{equation}
mostly dominated by the $f'_2(1525)$ state. Several statistically equivalent solutions are obtained for the detailed resonant structure depending on whether the smaller amplitudes interfere destructively or constructively with the dominant amplitude. The preferred solution that corresponds to the lowest values of the fit fractions along with constructive interference leads to the relative branching ratio measurement \begin{equation}
\frac{{\cal B}(B^0_s\to f'_2γ)}{{\cal B}(B^0_s\toφγ)}= 19.4^{+0.9}_{-0.8}\mathrm{~(stat.)}{}^{+1.4}_{-0.5}\mathrm{~(syst.)}\pm0.5\mathrm{~(\cal{B})}\%\nonumber, \end{equation} where the last uncertainty is due to the ratio of measured branching fractions to the $K^+K^-$ final state. This result represents the first observation of the radiative $B^0_s\to f'_2(1525)γ$ decay, which is the second radiative transition observed in the $B^0_s$ sector.
△ Less
Submitted 31 May, 2024;
originally announced June 2024.
-
Decision Mamba: Reinforcement Learning via Hybrid Selective Sequence Modeling
Authors:
Sili Huang,
Jifeng Hu,
Zhejian Yang,
Liwei Yang,
Tao Luo,
Hechang Chen,
Lichao Sun,
Bo Yang
Abstract:
Recent works have shown the remarkable superiority of transformer models in reinforcement learning (RL), where the decision-making problem is formulated as sequential generation. Transformer-based agents could emerge with self-improvement in online environments by providing task contexts, such as multiple trajectories, called in-context RL. However, due to the quadratic computation complexity of a…
▽ More
Recent works have shown the remarkable superiority of transformer models in reinforcement learning (RL), where the decision-making problem is formulated as sequential generation. Transformer-based agents could emerge with self-improvement in online environments by providing task contexts, such as multiple trajectories, called in-context RL. However, due to the quadratic computation complexity of attention in transformers, current in-context RL methods suffer from huge computational costs as the task horizon increases. In contrast, the Mamba model is renowned for its efficient ability to process long-term dependencies, which provides an opportunity for in-context RL to solve tasks that require long-term memory. To this end, we first implement Decision Mamba (DM) by replacing the backbone of Decision Transformer (DT). Then, we propose a Decision Mamba-Hybrid (DM-H) with the merits of transformers and Mamba in high-quality prediction and long-term memory. Specifically, DM-H first generates high-value sub-goals from long-term memory through the Mamba model. Then, we use sub-goals to prompt the transformer, establishing high-quality predictions. Experimental results demonstrate that DM-H achieves state-of-the-art in long and short-term tasks, such as D4RL, Grid World, and Tmaze benchmarks. Regarding efficiency, the online testing of DM-H in the long-term task is 28$\times$ times faster than the transformer-based baselines.
△ Less
Submitted 31 May, 2024;
originally announced June 2024.
-
In-Context Decision Transformer: Reinforcement Learning via Hierarchical Chain-of-Thought
Authors:
Sili Huang,
Jifeng Hu,
Hechang Chen,
Lichao Sun,
Bo Yang
Abstract:
In-context learning is a promising approach for offline reinforcement learning (RL) to handle online tasks, which can be achieved by providing task prompts. Recent works demonstrated that in-context RL could emerge with self-improvement in a trial-and-error manner when treating RL tasks as an across-episodic sequential prediction problem. Despite the self-improvement not requiring gradient updates…
▽ More
In-context learning is a promising approach for offline reinforcement learning (RL) to handle online tasks, which can be achieved by providing task prompts. Recent works demonstrated that in-context RL could emerge with self-improvement in a trial-and-error manner when treating RL tasks as an across-episodic sequential prediction problem. Despite the self-improvement not requiring gradient updates, current works still suffer from high computational costs when the across-episodic sequence increases with task horizons. To this end, we propose an In-context Decision Transformer (IDT) to achieve self-improvement in a high-level trial-and-error manner. Specifically, IDT is inspired by the efficient hierarchical structure of human decision-making and thus reconstructs the sequence to consist of high-level decisions instead of low-level actions that interact with environments. As one high-level decision can guide multi-step low-level actions, IDT naturally avoids excessively long sequences and solves online tasks more efficiently. Experimental results show that IDT achieves state-of-the-art in long-horizon tasks over current in-context RL methods. In particular, the online evaluation time of our IDT is \textbf{36$\times$} times faster than baselines in the D4RL benchmark and \textbf{27$\times$} times faster in the Grid World benchmark.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
Search for $e^{+}e^{-}\toη'ψ(2S)$ at center-of-mass energies from 4.66 to 4.95 GeV
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (638 additional authors not shown)
Abstract:
Using data samples with an integrated luminosity of $4.67~\mathrm{fb}^{-1}$ collected by the BESIII detector operating at the BEPCII collider, we search for the process $e^+e^- \rightarrow η' ψ(2S)$ at center-of-mass energies from $4.66$ to $4.95~\mathrm{GeV}$. No significant signal is observed, and upper limits for the Born cross sections $σ^B(e^+e^-\rightarrowη'ψ(2S))$ at the 90\% confidence lev…
▽ More
Using data samples with an integrated luminosity of $4.67~\mathrm{fb}^{-1}$ collected by the BESIII detector operating at the BEPCII collider, we search for the process $e^+e^- \rightarrow η' ψ(2S)$ at center-of-mass energies from $4.66$ to $4.95~\mathrm{GeV}$. No significant signal is observed, and upper limits for the Born cross sections $σ^B(e^+e^-\rightarrowη'ψ(2S))$ at the 90\% confidence level are determined.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
Enhancing Jailbreak Attack Against Large Language Models through Silent Tokens
Authors:
Jiahao Yu,
Haozheng Luo,
Jerry Yao-Chieh Hu,
Wenbo Guo,
Han Liu,
Xinyu Xing
Abstract:
Along with the remarkable successes of Language language models, recent research also started to explore the security threats of LLMs, including jailbreaking attacks. Attackers carefully craft jailbreaking prompts such that a target LLM will respond to the harmful question. Existing jailbreaking attacks require either human experts or leveraging complicated algorithms to craft jailbreaking prompts…
▽ More
Along with the remarkable successes of Language language models, recent research also started to explore the security threats of LLMs, including jailbreaking attacks. Attackers carefully craft jailbreaking prompts such that a target LLM will respond to the harmful question. Existing jailbreaking attacks require either human experts or leveraging complicated algorithms to craft jailbreaking prompts. In this paper, we introduce BOOST, a simple attack that leverages only the eos tokens. We demonstrate that rather than constructing complicated jailbreaking prompts, the attacker can simply append a few eos tokens to the end of a harmful question. It will bypass the safety alignment of LLMs and lead to successful jailbreaking attacks. We further apply BOOST to four representative jailbreak methods and show that the attack success rates of these methods can be significantly enhanced by simply adding eos tokens to the prompt. To understand this simple but novel phenomenon, we conduct empirical analyses. Our analysis reveals that adding eos tokens makes the target LLM believe the input is much less harmful, and eos tokens have low attention values and do not affect LLM's understanding of the harmful questions, leading the model to actually respond to the questions. Our findings uncover how fragile an LLM is against jailbreak attacks, motivating the development of strong safety alignment approaches.
△ Less
Submitted 4 June, 2024; v1 submitted 31 May, 2024;
originally announced May 2024.
-
Effects of degumming conditions on the structure of the regenerated silk fibroin and the properties of its film
Authors:
Ruixue Sun,
Junli Hu
Abstract:
The traditional degumming method using sodium carbonate solution severely damages the structure of silk fibroin and results in low molecular weight, which limits the properties and applications of silk materials. In this study, we report a modified degumming method and compared it with the traditional one. The results indicate that compared with the traditional degumming method, the modified degum…
▽ More
The traditional degumming method using sodium carbonate solution severely damages the structure of silk fibroin and results in low molecular weight, which limits the properties and applications of silk materials. In this study, we report a modified degumming method and compared it with the traditional one. The results indicate that compared with the traditional degumming method, the modified degumming method produces silk fibroin with higher \b{eta}-sheet content (53.43% vs 46.68%), higher crystallinity (44.92% vs 42.72%), and more concentrated molecular weight (100 ~ 300 kDa vs 72 ~ 300 kDa), while allowing for the same full removal of sericin. Correspondingly, the film of the silk fibroin prepared with the modified degumming method exhibits higher \b{eta}-sheet content (35.22% vs 32.82%), higher crystallinity (26.96% vs 25.53%), and higher tensile strength (87.91 MPa vs 72.62 MPa). This work provides a degumming method superior to the traditional one and lays a foundation for the development of high-performance silk materials.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
Study of the decays $χ_{cJ} \rightarrow Λ\barΛφ$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (637 additional authors not shown)
Abstract:
Based on $(2712.4 \pm 14.3) \times 10^{6}$ $ e^{+}e^{-}\toψ(3686)$ events collected with the BESIII detector operating at the BEPCII collider, we report the first evidence of $χ_{c0}\to Λ\bar Λφ$ decays and the first observation of $χ_{c1,2}\to Λ\bar Λφ$ decays, with significances of $4.5σ$, $11.3σ$ and $13.0σ$, respectively. The decay branching fractions of $χ_{c0,1,2}\to Λ\bar Λφ$ are measured t…
▽ More
Based on $(2712.4 \pm 14.3) \times 10^{6}$ $ e^{+}e^{-}\toψ(3686)$ events collected with the BESIII detector operating at the BEPCII collider, we report the first evidence of $χ_{c0}\to Λ\bar Λφ$ decays and the first observation of $χ_{c1,2}\to Λ\bar Λφ$ decays, with significances of $4.5σ$, $11.3σ$ and $13.0σ$, respectively. The decay branching fractions of $χ_{c0,1,2}\to Λ\bar Λφ$ are measured to be $( 2.99\pm1.24\pm0.19) \times 10^{-5}$, $(6.01\pm0.90\pm0.40 )\times 10^{-5}$, and $(7.13\pm0.81\pm0.36) \times 10^{-5}$, where the first uncertainties are statistical and the second systematic. No obvious enhancement near the $Λ\barΛ$ production threshold or excited $Λ$ state is found in the $Λφ$ (or $\barΛφ$) system.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
Xwin-LM: Strong and Scalable Alignment Practice for LLMs
Authors:
Bolin Ni,
**gCheng Hu,
Yixuan Wei,
Houwen Peng,
Zheng Zhang,
Gaofeng Meng,
Han Hu
Abstract:
In this work, we present Xwin-LM, a comprehensive suite of alignment methodologies for large language models (LLMs). This suite encompasses several key techniques, including supervised finetuning (SFT), reward modeling (RM), rejection sampling finetuning (RS), and direct preference optimization (DPO). The key components are as follows: (1) Xwin-LM-SFT, models initially finetuned with high-quality…
▽ More
In this work, we present Xwin-LM, a comprehensive suite of alignment methodologies for large language models (LLMs). This suite encompasses several key techniques, including supervised finetuning (SFT), reward modeling (RM), rejection sampling finetuning (RS), and direct preference optimization (DPO). The key components are as follows: (1) Xwin-LM-SFT, models initially finetuned with high-quality instruction data; (2) Xwin-Pair, a large-scale, multi-turn preference dataset meticulously annotated using GPT-4; (3) Xwin-RM, reward models trained on Xwin-Pair, developed at scales of 7B, 13B, and 70B parameters; (4) Xwin-Set, a multiwise preference dataset in which each prompt is linked to 64 unique responses generated by Xwin-LM-SFT and scored by Xwin-RM; (5) Xwin-LM-RS, models finetuned with the highest-scoring responses from Xwin-Set; (6) Xwin-LM-DPO, models further optimized on Xwin-Set using the DPO algorithm. Our evaluations on AlpacaEval and MT-bench demonstrate consistent and significant improvements across the pipeline, demonstrating the strength and scalability of Xwin-LM. The repository https://github.com/Xwin-LM/Xwin-LM will be continually updated to foster community research.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Initial measurement of reactor antineutrino oscillation at SNO+
Authors:
SNO+ Collaboration,
:,
A. Allega,
M. R. Anderson,
S. Andringa,
M. Askins,
D. J. Auty,
A. Bacon,
J. Baker,
F. Barão,
N. Barros,
R. Bayes,
E. W. Beier,
T. S. Bezerra,
A. Bialek,
S. D. Biller,
E. Blucher,
E. Caden,
E. J. Callaghan,
M. Chen,
S. Cheng,
B. Cleveland,
D. Cookman,
J. Corning,
M. A. Cox
, et al. (96 additional authors not shown)
Abstract:
The SNO+ collaboration reports its first spectral analysis of long-baseline reactor antineutrino oscillation using 114 tonne-years of data. Fitting the neutrino oscillation probability to the observed energy spectrum yields constraints on the neutrino mass-squared difference $Δm^2_{21}$. In the ranges allowed by previous measurements, the best-fit $Δm^2_{21}$ is (8.85$^{+1.10}_{-1.33}$) $\times$ 1…
▽ More
The SNO+ collaboration reports its first spectral analysis of long-baseline reactor antineutrino oscillation using 114 tonne-years of data. Fitting the neutrino oscillation probability to the observed energy spectrum yields constraints on the neutrino mass-squared difference $Δm^2_{21}$. In the ranges allowed by previous measurements, the best-fit $Δm^2_{21}$ is (8.85$^{+1.10}_{-1.33}$) $\times$ 10$^{-5}$ eV$^2$. This measurement is continuing in the next phases of SNO+ and is expected to surpass the present global precision on $Δm^2_{21}$ with about three years of data.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Distributed Bilevel Optimization with Communication Compression
Authors:
Yutong He,
Jie Hu,
Xinmeng Huang,
Songtao Lu,
Bin Wang,
Kun Yuan
Abstract:
Stochastic bilevel optimization tackles challenges involving nested optimization structures. Its fast-growing scale nowadays necessitates efficient distributed algorithms. In conventional distributed bilevel methods, each worker must transmit full-dimensional stochastic gradients to the server every iteration, leading to significant communication overhead and thus hindering efficiency and scalabil…
▽ More
Stochastic bilevel optimization tackles challenges involving nested optimization structures. Its fast-growing scale nowadays necessitates efficient distributed algorithms. In conventional distributed bilevel methods, each worker must transmit full-dimensional stochastic gradients to the server every iteration, leading to significant communication overhead and thus hindering efficiency and scalability. To resolve this issue, we introduce the first family of distributed bilevel algorithms with communication compression. The primary challenge in algorithmic development is mitigating bias in hypergradient estimation caused by the nested structure. We first propose C-SOBA, a simple yet effective approach with unbiased compression and provable linear speedup convergence. However, it relies on strong assumptions on bounded gradients. To address this limitation, we explore the use of moving average, error feedback, and multi-step compression in bilevel optimization, resulting in a series of advanced algorithms with relaxed assumptions and improved convergence properties. Numerical experiments show that our compressed bilevel algorithms can achieve $10\times$ reduction in communication overhead without severe performance degradation.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Deterministic fabrication of graphene hexagonal boron nitride moiré superlattices
Authors:
Rupini V. Kamat,
Aaron L. Sharpe,
Mihir Pendharkar,
Jenny Hu,
Steven J. Tran,
Gregory Zaborski Jr.,
Marisa Hocking,
Joe Finney,
Kenji Watanabe,
Takashi Taniguchi,
Marc A. Kastner,
Andrew J. Mannix,
Tony Heinz,
David Goldhaber-Gordon
Abstract:
The electronic properties of moiré heterostructures depend sensitively on the relative orientation between layers of the stack. For example, near-magic-angle twisted bilayer graphene (TBG) commonly shows superconductivity, yet a TBG sample with one of the graphene layers rotationally aligned to a hexagonal Boron Nitride (hBN) cladding layer provided the first experimental observation of orbital fe…
▽ More
The electronic properties of moiré heterostructures depend sensitively on the relative orientation between layers of the stack. For example, near-magic-angle twisted bilayer graphene (TBG) commonly shows superconductivity, yet a TBG sample with one of the graphene layers rotationally aligned to a hexagonal Boron Nitride (hBN) cladding layer provided the first experimental observation of orbital ferromagnetism. To create samples with aligned graphene/hBN, researchers often align edges of exfoliated flakes that appear straight in optical micrographs. However, graphene or hBN can cleave along either zig-zag or armchair lattice directions, introducing a 30 degree ambiguity in the relative orientation of two flakes. By characterizing the crystal lattice orientation of exfoliated flakes prior to stacking using Raman and second-harmonic generation for graphene and hBN, respectively, we unambiguously align monolayer graphene to hBN at a near-0 degree, not 30 degree, relative twist angle. We confirm this alignment by torsional force microscopy (TFM) of the graphene/hBN moiré on an open-face stack, and then by cryogenic transport measurements, after full encapsulation with a second, non-aligned hBN layer. This work demonstrates a key step toward systematically exploring the effects of the relative twist angle between dissimilar materials within moiré heterostructures.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.