-
Observation of the semileptonic decays $D^0\rightarrow K_S^0π^-π^0 e^+ ν_e$ and $D^+\rightarrow K_S^0π^+π^- e^+ ν_e$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann
, et al. (600 additional authors not shown)
Abstract:
By analyzing $e^+e^-$ annihilation data corresponding to an integrated luminosity of 2.93 $\rm fb^{-1}$ collected at a center-of-mass energy of 3.773 GeV with the \text{BESIII} detector, the first observation of the semileptonic decays $D^0\rightarrow K_S^0π^-π^0 e^+ ν_e$ and $D^+\rightarrow K_S^0π^+π^- e^+ ν_e$ is reported. With a dominant hadronic contribution from $K_1(1270)$, the branching fra…
▽ More
By analyzing $e^+e^-$ annihilation data corresponding to an integrated luminosity of 2.93 $\rm fb^{-1}$ collected at a center-of-mass energy of 3.773 GeV with the \text{BESIII} detector, the first observation of the semileptonic decays $D^0\rightarrow K_S^0π^-π^0 e^+ ν_e$ and $D^+\rightarrow K_S^0π^+π^- e^+ ν_e$ is reported. With a dominant hadronic contribution from $K_1(1270)$, the branching fractions are measured to be $\mathcal{B}(D^0\rightarrow {K}_1(1270)^-(\to K^0_Sπ^-π^0)e^+ν_e)=(1.69^{+0.53}_{-0.46}\pm0.15)\times10^{-4}$ and $\mathcal{B}(D^+\to \bar{K}_1(1270)^0(\to K^0_Sπ^+π^-)e^+ν_e)=(1.47^{+0.45}_{-0.40}\pm0.20)\times10^{-4}$ with statistical significance of 5.4$σ$ and 5.6$σ$, respectively. When combined with measurements of the $K_1(1270)\to K^+π^-π$ decays, the absolute branching fractions are determined to be $\mathcal{B}(D^0\to K_1(1270)^-e^+ν_e)=(1.05^{+0.33}_{-0.28}\pm0.12\pm0.12)\times10^{-3}$ and $\mathcal{B}(D^+\to \bar{K}_1(1270)^0e^+ν_e)=(1.29^{+0.40}_{-0.35}\pm0.18\pm0.15)\times10^{-3}$. The first and second uncertainties are statistical and systematic, respectively, and the third uncertainties originate from the assumed branching fractions of the $K_1(1270)\to Kππ$ decays.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object
Authors:
Chenshuang Zhang,
Fei Pan,
Junmo Kim,
In So Kweon,
Chengzhi Mao
Abstract:
We establish rigorous benchmarks for visual perception robustness. Synthetic images such as ImageNet-C, ImageNet-9, and Stylized ImageNet provide specific type of evaluation over synthetic corruptions, backgrounds, and textures, yet those robustness benchmarks are restricted in specified variations and have low synthetic quality. In this work, we introduce generative model as a data source for syn…
▽ More
We establish rigorous benchmarks for visual perception robustness. Synthetic images such as ImageNet-C, ImageNet-9, and Stylized ImageNet provide specific type of evaluation over synthetic corruptions, backgrounds, and textures, yet those robustness benchmarks are restricted in specified variations and have low synthetic quality. In this work, we introduce generative model as a data source for synthesizing hard images that benchmark deep models' robustness. Leveraging diffusion models, we are able to generate images with more diversified backgrounds, textures, and materials than any prior work, where we term this benchmark as ImageNet-D. Experimental results show that ImageNet-D results in a significant accuracy drop to a range of vision models, from the standard ResNet visual classifier to the latest foundation models like CLIP and MiniGPT-4, significantly reducing their accuracy by up to 60\%. Our work suggests that diffusion models can be an effective source to test vision models. The code and dataset are available at https://github.com/chenshuang-zhang/imagenet_d.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
ReMamber: Referring Image Segmentation with Mamba Twister
Authors:
Yuhuan Yang,
Chaofan Ma,
Jiangchao Yao,
Zhun Zhong,
Ya Zhang,
Yanfeng Wang
Abstract:
Referring Image Segmentation (RIS) leveraging transformers has achieved great success on the interpretation of complex visual-language tasks. However, the quadratic computation cost makes it resource-consuming in capturing long-range visual-language dependencies. Fortunately, Mamba addresses this with efficient linear complexity in processing. However, directly applying Mamba to multi-modal intera…
▽ More
Referring Image Segmentation (RIS) leveraging transformers has achieved great success on the interpretation of complex visual-language tasks. However, the quadratic computation cost makes it resource-consuming in capturing long-range visual-language dependencies. Fortunately, Mamba addresses this with efficient linear complexity in processing. However, directly applying Mamba to multi-modal interactions presents challenges, primarily due to inadequate channel interactions for the effective fusion of multi-modal data. In this paper, we propose ReMamber, a novel RIS architecture that integrates the power of Mamba with a multi-modal Mamba Twister block. The Mamba Twister explicitly models image-text interaction, and fuses textual and visual features through its unique channel and spatial twisting mechanism. We achieve the state-of-the-art on three challenging benchmarks. Moreover, we conduct thorough analyses of ReMamber and discuss other fusion designs using Mamba. These provide valuable perspectives for future research.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
Multi-Objective Trajectory Planning with Dual-Encoder
Authors:
Beibei Zhang,
Tian Xiang,
Chentao Mao,
Yuhua Zheng,
Shuai Li,
Haoyi Niu,
Xiangming Xi,
Wenyuan Bai,
Feng Gao
Abstract:
Time-jerk optimal trajectory planning is crucial in advancing robotic arms' performance in dynamic tasks. Traditional methods rely on solving complex nonlinear programming problems, bringing significant delays in generating optimized trajectories. In this paper, we propose a two-stage approach to accelerate time-jerk optimal trajectory planning. Firstly, we introduce a dual-encoder based transform…
▽ More
Time-jerk optimal trajectory planning is crucial in advancing robotic arms' performance in dynamic tasks. Traditional methods rely on solving complex nonlinear programming problems, bringing significant delays in generating optimized trajectories. In this paper, we propose a two-stage approach to accelerate time-jerk optimal trajectory planning. Firstly, we introduce a dual-encoder based transformer model to establish a good preliminary trajectory. This trajectory is subsequently refined through sequential quadratic programming to improve its optimality and robustness. Our approach outperforms the state-of-the-art by up to 79.72\% in reducing trajectory planning time. Compared with existing methods, our method shrinks the optimality gap with the objective function value decreasing by up to 29.9\%.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Cross section measurement of $e^+e^-\to ηψ(2S)$ and search for $e^+e^-\toη\tilde{X}(3872)$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (639 additional authors not shown)
Abstract:
The energy-dependent cross section for $e^+e^-\to ηψ(2S)$ is measured at eighteen center of mass energies from 4.288 GeV to 4.951 GeV using the BESIII detector. Using the same data samples, we also perform the first search for the reaction $e^+e^-\toη\tilde{X}(3872)$, but no evidence is found for the $\tilde{X}(3872)$ in the $π^+π^- J/ψ$ mass distribution. At each of the eighteen center of mass en…
▽ More
The energy-dependent cross section for $e^+e^-\to ηψ(2S)$ is measured at eighteen center of mass energies from 4.288 GeV to 4.951 GeV using the BESIII detector. Using the same data samples, we also perform the first search for the reaction $e^+e^-\toη\tilde{X}(3872)$, but no evidence is found for the $\tilde{X}(3872)$ in the $π^+π^- J/ψ$ mass distribution. At each of the eighteen center of mass energies, upper limits at the 90\% confidence level on the cross section for $e^+e^-\toηψ(2S)$ and on the product of the $e^+e^-\toη\tilde{X}(3872)$ cross section with the branching fraction of $\tilde{X}(3872)\toπ^+π^- J/ψ$ are reported.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
A Mixed Multiscale Spectral Generalized Finite Element Method
Authors:
Christian Alber,
Chupeng Ma,
Robert Scheichl
Abstract:
We present a multiscale mixed finite element method for solving second order elliptic equations with general $L^{\infty}$-coefficients arising from flow in highly heterogeneous porous media. Our approach is based on a multiscale spectral generalized finite element method (MS-GFEM) and exploits the superior local mass conservation properties of mixed finite elements. Following the MS-GFEM framework…
▽ More
We present a multiscale mixed finite element method for solving second order elliptic equations with general $L^{\infty}$-coefficients arising from flow in highly heterogeneous porous media. Our approach is based on a multiscale spectral generalized finite element method (MS-GFEM) and exploits the superior local mass conservation properties of mixed finite elements. Following the MS-GFEM framework, optimal local approximation spaces are built for the velocity field by solving local eigenvalue problems over generalized harmonic spaces. The resulting global velocity space is then enriched suitably to ensure inf-sup stability. We develop the mixed MS-GFEM for both continuous and discrete formulations, with Raviart-Thomas based mixed finite elements underlying the discrete method. Exponential convergence with respect to local degrees of freedom is proven at both the continuous and discrete levels. Numerical results are presented to support the theory and to validate the proposed method.
△ Less
Submitted 4 April, 2024; v1 submitted 25 March, 2024;
originally announced March 2024.
-
Frankenstein: Generating Semantic-Compositional 3D Scenes in One Tri-Plane
Authors:
Han Yan,
Yang Li,
Zhennan Wu,
Shenzhou Chen,
Weixuan Sun,
Taizhang Shang,
Weizhe Liu,
Tian Chen,
Xiaqiang Dai,
Chao Ma,
Hongdong Li,
Pan Ji
Abstract:
We present Frankenstein, a diffusion-based framework that can generate semantic-compositional 3D scenes in a single pass. Unlike existing methods that output a single, unified 3D shape, Frankenstein simultaneously generates multiple separated shapes, each corresponding to a semantically meaningful part. The 3D scene information is encoded in one single tri-plane tensor, from which multiple Singed…
▽ More
We present Frankenstein, a diffusion-based framework that can generate semantic-compositional 3D scenes in a single pass. Unlike existing methods that output a single, unified 3D shape, Frankenstein simultaneously generates multiple separated shapes, each corresponding to a semantically meaningful part. The 3D scene information is encoded in one single tri-plane tensor, from which multiple Singed Distance Function (SDF) fields can be decoded to represent the compositional shapes. During training, an auto-encoder compresses tri-planes into a latent space, and then the denoising diffusion process is employed to approximate the distribution of the compositional scenes. Frankenstein demonstrates promising results in generating room interiors as well as human avatars with automatically separated parts. The generated scenes facilitate many downstream applications, such as part-wise re-texturing, object rearrangement in the room or avatar cloth re-targeting.
△ Less
Submitted 24 March, 2024;
originally announced March 2024.
-
Deep Domain Adaptation: A Sim2Real Neural Approach for Improving Eye-Tracking Systems
Authors:
Viet Dung Nguyen,
Reynold Bailey,
Gabriel J. Diaz,
Chengyi Ma,
Alexander Fix,
Alexander Ororbia
Abstract:
Eye image segmentation is a critical step in eye tracking that has great influence over the final gaze estimate. Segmentation models trained using supervised machine learning can excel at this task, their effectiveness is determined by the degree of overlap between the narrow distributions of image properties defined by the target dataset and highly specific training datasets, of which there are f…
▽ More
Eye image segmentation is a critical step in eye tracking that has great influence over the final gaze estimate. Segmentation models trained using supervised machine learning can excel at this task, their effectiveness is determined by the degree of overlap between the narrow distributions of image properties defined by the target dataset and highly specific training datasets, of which there are few. Attempts to broaden the distribution of existing eye image datasets through the inclusion of synthetic eye images have found that a model trained on synthetic images will often fail to generalize back to real-world eye images. In remedy, we use dimensionality-reduction techniques to measure the overlap between the target eye images and synthetic training data, and to prune the training dataset in a manner that maximizes distribution overlap. We demonstrate that our methods result in robust, improved performance when tackling the discrepancy between simulation and real-world data samples.
△ Less
Submitted 23 March, 2024;
originally announced March 2024.
-
InterFusion: Text-Driven Generation of 3D Human-Object Interaction
Authors:
Sisi Dai,
Wenhao Li,
Haowen Sun,
Haibin Huang,
Chongyang Ma,
Hui Huang,
Kai Xu,
Ruizhen Hu
Abstract:
In this study, we tackle the complex task of generating 3D human-object interactions (HOI) from textual descriptions in a zero-shot text-to-3D manner. We identify and address two key challenges: the unsatisfactory outcomes of direct text-to-3D methods in HOI, largely due to the lack of paired text-interaction data, and the inherent difficulties in simultaneously generating multiple concepts with c…
▽ More
In this study, we tackle the complex task of generating 3D human-object interactions (HOI) from textual descriptions in a zero-shot text-to-3D manner. We identify and address two key challenges: the unsatisfactory outcomes of direct text-to-3D methods in HOI, largely due to the lack of paired text-interaction data, and the inherent difficulties in simultaneously generating multiple concepts with complex spatial relationships. To effectively address these issues, we present InterFusion, a two-stage framework specifically designed for HOI generation. InterFusion involves human pose estimations derived from text as geometric priors, which simplifies the text-to-3D conversion process and introduces additional constraints for accurate object generation. At the first stage, InterFusion extracts 3D human poses from a synthesized image dataset depicting a wide range of interactions, subsequently map** these poses to interaction descriptions. The second stage of InterFusion capitalizes on the latest developments in text-to-3D generation, enabling the production of realistic and high-quality 3D HOI scenes. This is achieved through a local-global optimization process, where the generation of human body and object is optimized separately, and jointly refined with a global optimization of the entire scene, ensuring a seamless and contextually coherent integration. Our experimental results affirm that InterFusion significantly outperforms existing state-of-the-art methods in 3D HOI generation.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
Federated Learning based on Pruning and Recovery
Authors:
Chengjie Ma
Abstract:
A novel federated learning training framework for heterogeneous environments is presented, taking into account the diverse network speeds of clients in realistic settings. This framework integrates asynchronous learning algorithms and pruning techniques, effectively addressing the inefficiencies of traditional federated learning algorithms in scenarios involving heterogeneous devices, as well as t…
▽ More
A novel federated learning training framework for heterogeneous environments is presented, taking into account the diverse network speeds of clients in realistic settings. This framework integrates asynchronous learning algorithms and pruning techniques, effectively addressing the inefficiencies of traditional federated learning algorithms in scenarios involving heterogeneous devices, as well as tackling the staleness issue and inadequate training of certain clients in asynchronous algorithms. Through the incremental restoration of model size during training, the framework expedites model training while preserving model accuracy. Furthermore, enhancements to the federated learning aggregation process are introduced, incorporating a buffering mechanism to enable asynchronous federated learning to operate akin to synchronous learning. Additionally, optimizations in the process of the server transmitting the global model to clients reduce communication overhead. Our experiments across various datasets demonstrate that: (i) significant reductions in training time and improvements in convergence accuracy are achieved compared to conventional asynchronous FL and HeteroFL; (ii) the advantages of our approach are more pronounced in scenarios with heterogeneous clients and non-IID client data.
△ Less
Submitted 16 March, 2024;
originally announced March 2024.
-
Precise measurement of the $e^+e^-\to D_s^+D_s^-$ cross sections at center-of-mass energies from threshold to 4.95 GeV
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (638 additional authors not shown)
Abstract:
Using the $e^+e^-$ collision data collected with the BESIII detector operating at the BEPCII collider, at center-of-mass energies from the threshold to $4.95$~GeV, we present precise measurements of the cross sections for the process $e^+e^-\to D_s^+D_s^-$ using a single tag method. The resulting cross section lineshape exhibits several new structures, thereby offering an input for coupled channel…
▽ More
Using the $e^+e^-$ collision data collected with the BESIII detector operating at the BEPCII collider, at center-of-mass energies from the threshold to $4.95$~GeV, we present precise measurements of the cross sections for the process $e^+e^-\to D_s^+D_s^-$ using a single tag method. The resulting cross section lineshape exhibits several new structures, thereby offering an input for coupled channel analysis and model tests, which are critical to understand vector charmonium-like states with masses between 4 and 5~GeV.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
A Survey of Neural Code Intelligence: Paradigms, Advances and Beyond
Authors:
Qiushi Sun,
Zhirui Chen,
Fangzhi Xu,
Kanzhi Cheng,
Chang Ma,
Zhangyue Yin,
Jianing Wang,
Chengcheng Han,
Renyu Zhu,
Shuai Yuan,
Qipeng Guo,
Xipeng Qiu,
Pengcheng Yin,
Xiaoli Li,
Fei Yuan,
Lingpeng Kong,
Xiang Li,
Zhiyong Wu
Abstract:
Neural Code Intelligence -- leveraging deep learning to understand, generate, and optimize code -- holds immense potential for transformative impacts on the whole society. Bridging the gap between Natural Language and Programming Language, this domain has drawn significant attention from researchers in both research communities over the past few years. This survey presents a systematic and chronol…
▽ More
Neural Code Intelligence -- leveraging deep learning to understand, generate, and optimize code -- holds immense potential for transformative impacts on the whole society. Bridging the gap between Natural Language and Programming Language, this domain has drawn significant attention from researchers in both research communities over the past few years. This survey presents a systematic and chronological review of the advancements in code intelligence, encompassing over 50 representative models and their variants, more than 20 categories of tasks, and an extensive coverage of over 680 related works. We follow the historical progression to trace the paradigm shifts across different research phases (e.g., from modeling code with recurrent neural networks to the era of Large Language Models). Concurrently, we highlight the major technical transitions in models, tasks, and evaluations spanning through different stages. For applications, we also observe a co-evolving shift. It spans from initial endeavors to tackling specific scenarios, through exploring a diverse array of tasks during its rapid expansion, to currently focusing on tackling increasingly complex and varied real-world challenges. Building on our examination of the developmental trajectories, we further investigate the emerging synergies between code intelligence and broader machine intelligence, uncovering new cross-domain opportunities and illustrating the substantial influence of code intelligence across various domains. Finally, we delve into both the opportunities and challenges associated with this field, alongside elucidating our insights on the most promising research directions. An ongoing, dynamically updated project and resources associated with this survey have been released at https://github.com/QiushiSun/NCISurvey.
△ Less
Submitted 23 June, 2024; v1 submitted 21 March, 2024;
originally announced March 2024.
-
SpatialPIN: Enhancing Spatial Reasoning Capabilities of Vision-Language Models through Prompting and Interacting 3D Priors
Authors:
Chenyang Ma,
Kai Lu,
Ta-Ying Cheng,
Niki Trigoni,
Andrew Markham
Abstract:
Current state-of-the-art spatial reasoning-enhanced VLMs are trained to excel at spatial visual question answering (VQA). However, we believe that higher-level 3D-aware tasks, such as articulating dynamic scene changes and motion planning, require a fundamental and explicit 3D understanding beyond current spatial VQA datasets. In this work, we present SpatialPIN, a framework designed to enhance th…
▽ More
Current state-of-the-art spatial reasoning-enhanced VLMs are trained to excel at spatial visual question answering (VQA). However, we believe that higher-level 3D-aware tasks, such as articulating dynamic scene changes and motion planning, require a fundamental and explicit 3D understanding beyond current spatial VQA datasets. In this work, we present SpatialPIN, a framework designed to enhance the spatial reasoning capabilities of VLMs through prompting and interacting with priors from multiple 3D foundation models in a zero-shot, training-free manner. Extensive experiments demonstrate that our spatial reasoning-imbued VLM performs well on various forms of spatial VQA and can extend to help in various downstream robotics tasks such as pick and stack and trajectory planning.
△ Less
Submitted 6 June, 2024; v1 submitted 18 March, 2024;
originally announced March 2024.
-
Search for $ΔS=2$ nonleptonic hyperon decays $Ω^-\toΣ^{0}π^{-}$ and $Ω^-\to nK^{-}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (638 additional authors not shown)
Abstract:
Using $(27.12 \pm 0.14) \times 10^{8}$ $ψ(3686)$ events collected by the BESIII detector at the center-of-mass energy of $\sqrt{s} = 3.686$ GeV, we search for the first time for two nonleptonic hyperon decays that change strangeness by two units, $Ω^-\toΣ^{0}π^-$ and $Ω^-\to nK^{-}$. No significant signal is observed. The upper limits on their decay branching fractions are determined to be…
▽ More
Using $(27.12 \pm 0.14) \times 10^{8}$ $ψ(3686)$ events collected by the BESIII detector at the center-of-mass energy of $\sqrt{s} = 3.686$ GeV, we search for the first time for two nonleptonic hyperon decays that change strangeness by two units, $Ω^-\toΣ^{0}π^-$ and $Ω^-\to nK^{-}$. No significant signal is observed. The upper limits on their decay branching fractions are determined to be $\mathcal{B}(Ω^-\toΣ^{0}π^-) < 5.4\times 10^{-4}$ and $\mathcal{B}(Ω^-\to nK^{-}) < 2.4\times 10^{-4}$ at the $90\%$ confidence level.
△ Less
Submitted 14 April, 2024; v1 submitted 20 March, 2024;
originally announced March 2024.
-
Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning
Authors:
Chong Ma,
Hanqi Jiang,
Wenting Chen,
Yiwei Li,
Zihao Wu,
Xiaowei Yu,
Zhengliang Liu,
Lei Guo,
Dajiang Zhu,
Tuo Zhang,
Dinggang Shen,
Tianming Liu,
Xiang Li
Abstract:
In the medical multi-modal frameworks, the alignment of cross-modality features presents a significant challenge. However, existing works have learned features that are implicitly aligned from the data, without considering the explicit relationships in the medical context. This data-reliance may lead to low generalization of the learned alignment relationships. In this work, we propose the Eye-gaz…
▽ More
In the medical multi-modal frameworks, the alignment of cross-modality features presents a significant challenge. However, existing works have learned features that are implicitly aligned from the data, without considering the explicit relationships in the medical context. This data-reliance may lead to low generalization of the learned alignment relationships. In this work, we propose the Eye-gaze Guided Multi-modal Alignment (EGMA) framework to harness eye-gaze data for better alignment of medical visual and textual features. We explore the natural auxiliary role of radiologists' eye-gaze data in aligning medical images and text, and introduce a novel approach by using eye-gaze data, collected synchronously by radiologists during diagnostic evaluations. We conduct downstream tasks of image classification and image-text retrieval on four medical datasets, where EGMA achieved state-of-the-art performance and stronger generalization across different datasets. Additionally, we explore the impact of varying amounts of eye-gaze data on model performance, highlighting the feasibility and utility of integrating this auxiliary data into multi-modal alignment framework.
△ Less
Submitted 13 June, 2024; v1 submitted 18 March, 2024;
originally announced March 2024.
-
PiGW: A Plug-in Generative Watermarking Framework
Authors:
Rui Ma,
Mengxi Guo,
Li Yuming,
Hengyuan Zhang,
Cong Ma,
Yuan Li,
Xiaodong Xie,
Shanghang Zhang
Abstract:
Integrating watermarks into generative images is a critical strategy for protecting intellectual property and enhancing artificial intelligence security. This paper proposes Plug-in Generative Watermarking (PiGW) as a general framework for integrating watermarks into generative images. More specifically, PiGW embeds watermark information into the initial noise using a learnable watermark embedding…
▽ More
Integrating watermarks into generative images is a critical strategy for protecting intellectual property and enhancing artificial intelligence security. This paper proposes Plug-in Generative Watermarking (PiGW) as a general framework for integrating watermarks into generative images. More specifically, PiGW embeds watermark information into the initial noise using a learnable watermark embedding network and an adaptive frequency spectrum mask. Furthermore, it optimizes training costs by gradually increasing timesteps. Extensive experiments demonstrate that PiGW enables embedding watermarks into the generated image with negligible quality loss while achieving true invisibility and high resistance to noise attacks. Moreover, PiGW can serve as a plugin for various commonly used generative structures and multimodal generative content types. Finally, we demonstrate how PiGW can also be utilized for detecting generated images, contributing to the promotion of secure AI development. The project code will be made available on GitHub.
△ Less
Submitted 19 March, 2024; v1 submitted 4 January, 2024;
originally announced March 2024.
-
SelfIE: Self-Interpretation of Large Language Model Embeddings
Authors:
Haozhe Chen,
Carl Vondrick,
Chengzhi Mao
Abstract:
How do large language models (LLMs) obtain their answers? The ability to explain and control an LLM's reasoning process is key for reliability, transparency, and future model developments. We propose SelfIE (Self-Interpretation of Embeddings), a framework that enables LLMs to interpret their own embeddings in natural language by leveraging their ability to respond to inquiries about a given passag…
▽ More
How do large language models (LLMs) obtain their answers? The ability to explain and control an LLM's reasoning process is key for reliability, transparency, and future model developments. We propose SelfIE (Self-Interpretation of Embeddings), a framework that enables LLMs to interpret their own embeddings in natural language by leveraging their ability to respond to inquiries about a given passage. Capable of interpreting open-world concepts in the hidden embeddings, SelfIE reveals LLM internal reasoning in cases such as making ethical decisions, internalizing prompt injection, and recalling harmful knowledge. SelfIE's text descriptions on hidden embeddings also open up new avenues to control LLM reasoning. We propose Supervised Control, which allows editing open-ended concepts while only requiring gradient computation of individual layer. We extend RLHF to hidden embeddings and propose Reinforcement Control that erases harmful knowledge in LLM without supervision targets.
△ Less
Submitted 25 March, 2024; v1 submitted 16 March, 2024;
originally announced March 2024.
-
Test of lepton universality and measurement of the form factors of $D^0\to K^{*}(892)^-μ^+ν_μ$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (637 additional authors not shown)
Abstract:
We report a first study of the semileptonic decay $D^0\rightarrow K^-π^0μ^{+}ν_μ$ by analyzing an $e^+e^-$ annihilation data sample of $7.9~\mathrm{fb}^{-1}$ collected at the center-of-mass energy of 3.773 GeV with the BESIII detector. The absolute branching fraction of $D^0\to K^-π^0μ^{+}ν_μ$ is measured for the first time to be $(0.729 \pm 0.014_{\rm stat} \pm 0.011_{\rm syst})\%$. Based on an a…
▽ More
We report a first study of the semileptonic decay $D^0\rightarrow K^-π^0μ^{+}ν_μ$ by analyzing an $e^+e^-$ annihilation data sample of $7.9~\mathrm{fb}^{-1}$ collected at the center-of-mass energy of 3.773 GeV with the BESIII detector. The absolute branching fraction of $D^0\to K^-π^0μ^{+}ν_μ$ is measured for the first time to be $(0.729 \pm 0.014_{\rm stat} \pm 0.011_{\rm syst})\%$. Based on an amplitude analysis, the $S\text{-}{\rm wave}$ contribution is determined to be $(5.76 \pm 0.35_{\rm stat} \pm 0.29_{\rm syst})\%$ of the total decay rate in addition to the dominated $K^{*}(892)^-$ component. The branching fraction of $D^0\to K^{*}(892)^-μ^+ν_μ$ is given to be $(2.062 \pm 0.039_{\rm stat} \pm 0.032_{\rm syst})\%$, which improves the precision of the world average by a factor of 5. Combining with the world average of ${\mathcal B}(D^0\to K^{*}(892)^-e^+ν_e)$, the ratio of the branching fractions obtained is $\frac{{\mathcal B}(D^0\to K^{*}(892)^-μ^+ν_μ)}{{\mathcal B}(D^0\to K^{*}(892)^-e^+ν_e)} = 0.96\pm0.08$, in agreement with lepton flavor universality. Furthermore, assuming single-pole dominance parameterization, the most precise hadronic form factor ratios for $D^0\to K^{*}(892)^{-} μ^+ν_μ$ are extracted to be $r_{V}=V(0)/A_1(0)=1.37 \pm 0.09_{\rm stat} \pm 0.03_{\rm syst}$ and $r_{2}=A_2(0)/A_1(0)=0.76 \pm 0.06_{\rm stat} \pm 0.02_{\rm syst}$.
△ Less
Submitted 16 March, 2024;
originally announced March 2024.
-
From Chaos to Clarity: Time Series Anomaly Detection in Astronomical Observations
Authors:
Xinli Hao,
Yile Chen,
Chen Yang,
Zhihui Du,
Chaohong Ma,
Chao Wu,
Xiaofeng Meng
Abstract:
With the development of astronomical facilities, large-scale time series data observed by these facilities is being collected. Analyzing anomalies in these astronomical observations is crucial for uncovering potential celestial events and physical phenomena, thus advancing the scientific research process. However, existing time series anomaly detection methods fall short in tackling the unique cha…
▽ More
With the development of astronomical facilities, large-scale time series data observed by these facilities is being collected. Analyzing anomalies in these astronomical observations is crucial for uncovering potential celestial events and physical phenomena, thus advancing the scientific research process. However, existing time series anomaly detection methods fall short in tackling the unique characteristics of astronomical observations where each star is inherently independent but interfered by random concurrent noise, resulting in a high rate of false alarms. To overcome the challenges, we propose AERO, a novel two-stage framework tailored for unsupervised anomaly detection in astronomical observations. In the first stage, we employ a Transformer-based encoder-decoder architecture to learn the normal temporal patterns on each variate (i.e., star) in alignment with the characteristic of variate independence. In the second stage, we enhance the graph neural network with a window-wise graph structure learning to tackle the occurrence of concurrent noise characterized by spatial and temporal randomness. In this way, AERO is not only capable of distinguishing normal temporal patterns from potential anomalies but also effectively differentiating concurrent noise, thus decreasing the number of false alarms. We conducted extensive experiments on three synthetic datasets and three real-world datasets. The results demonstrate that AERO outperforms the compared baselines. Notably, compared to the state-of-the-art model, AERO improves the F1-score by up to 8.76% and 2.63% on synthetic and real-world datasets respectively.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
StreamingDialogue: Prolonged Dialogue Learning via Long Context Compression with Minimal Losses
Authors:
Jia-Nan Li,
Quan Tu,
Cunli Mao,
Zhengtao Yu,
Ji-Rong Wen,
Rui Yan
Abstract:
Standard Large Language Models (LLMs) struggle with handling dialogues with long contexts due to efficiency and consistency issues. According to our observation, dialogue contexts are highly structured, and the special token of \textit{End-of-Utterance} (EoU) in dialogues has the potential to aggregate information. We refer to the EoU tokens as ``conversational attention sinks'' (conv-attn sinks).…
▽ More
Standard Large Language Models (LLMs) struggle with handling dialogues with long contexts due to efficiency and consistency issues. According to our observation, dialogue contexts are highly structured, and the special token of \textit{End-of-Utterance} (EoU) in dialogues has the potential to aggregate information. We refer to the EoU tokens as ``conversational attention sinks'' (conv-attn sinks). Accordingly, we introduce StreamingDialogue, which compresses long dialogue history into conv-attn sinks with minimal losses, and thus reduces computational complexity quadratically with the number of sinks (i.e., the number of utterances). Current LLMs already demonstrate the ability to handle long context window, e.g., a window size of 200k or more. To this end, by compressing utterances into EoUs, our method has the potential to handle more than 200k of utterances, resulting in a prolonged dialogue learning. In order to minimize information losses from reconstruction after compression, we design two learning strategies of short-memory reconstruction (SMR) and long-memory reactivation (LMR). Our method outperforms strong baselines in dialogue tasks and achieves a 4 $\times$ speedup while reducing memory usage by 18 $\times$ compared to dense attention recomputation.
△ Less
Submitted 13 March, 2024;
originally announced March 2024.
-
TriOS Schwarzschild Orbit Modeling: Robustness of Parameter Inference for Masses and Shapes of Triaxial Galaxies with Supermassive Black Holes
Authors:
Jacob Pilawa,
Emily R. Liepold,
Chung-Pei Ma
Abstract:
Evidence for the majority of the supermassive black holes in the local universe has been obtained dynamically from stellar motions with the Schwarzschild orbit superposition method. However, there have been only a handful of studies using simulated data to examine the ability of this method to reliably recover known input black hole masses $M_{BH}$ and other galaxy parameters. Here we conduct a co…
▽ More
Evidence for the majority of the supermassive black holes in the local universe has been obtained dynamically from stellar motions with the Schwarzschild orbit superposition method. However, there have been only a handful of studies using simulated data to examine the ability of this method to reliably recover known input black hole masses $M_{BH}$ and other galaxy parameters. Here we conduct a comprehensive assessment of the reliability of the triaxial Schwarzschild method at $\textit{simultaneously}$ determining $M_{BH}$, stellar mass-to-light ratio $M^{*}/L$, dark matter mass, and three intrinsic triaxial shape parameters of simulated galaxies. For each of 25 rounds of mock observations using simulated stellar kinematics and the $\texttt{TriOS}$ code, we derive best-fitting parameters and confidence intervals after a full search in the 6D parameter space with our likelihood-based model inference scheme. The two key mass parameters, $M_{BH}$ and $M^{*}/L$, are recovered within the 68% confidence interval, and other parameters are recovered between 68% and 95% confidence intervals. The spatially varying velocity anisotropy of the stellar orbits is also well recovered. We explore whether the goodness-of-fit measure used for galaxy model selection in our pipeline is biased by variable complexity across the 6D parameter space. In our tests, adding a penalty term to the likelihood measure either makes little difference, or worsens the recovery in some cases.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
Determination of the number of $ψ(3686)$ events taken at BESIII
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (639 additional authors not shown)
Abstract:
The number of $ψ(3686)$ events collected by the BESIII detector during the 2021 run period is determined to be $(2259.3\pm 11.1)\times 10^6$ by counting inclusive $ψ(3686)$ hadronic events. The uncertainty is systematic and the statistical uncertainty is negligible. Meanwhile, the numbers of $ψ(3686)$ events collected during the 2009 and 2012 run periods are updated to be…
▽ More
The number of $ψ(3686)$ events collected by the BESIII detector during the 2021 run period is determined to be $(2259.3\pm 11.1)\times 10^6$ by counting inclusive $ψ(3686)$ hadronic events. The uncertainty is systematic and the statistical uncertainty is negligible. Meanwhile, the numbers of $ψ(3686)$ events collected during the 2009 and 2012 run periods are updated to be $(107.7\pm0.6)\times 10^6$ and $(345.4\pm 2.6)\times 10^6$, respectively. Both numbers are consistent with the previous measurements within one standard deviation. The total number of $ψ(3686)$ events in the three data samples is $(2712.4\pm14.3)\times10^6$.
△ Less
Submitted 28 May, 2024; v1 submitted 11 March, 2024;
originally announced March 2024.
-
Lightning NeRF: Efficient Hybrid Scene Representation for Autonomous Driving
Authors:
Junyi Cao,
Zhichao Li,
Naiyan Wang,
Chao Ma
Abstract:
Recent studies have highlighted the promising application of NeRF in autonomous driving contexts. However, the complexity of outdoor environments, combined with the restricted viewpoints in driving scenarios, complicates the task of precisely reconstructing scene geometry. Such challenges often lead to diminished quality in reconstructions and extended durations for both training and rendering. To…
▽ More
Recent studies have highlighted the promising application of NeRF in autonomous driving contexts. However, the complexity of outdoor environments, combined with the restricted viewpoints in driving scenarios, complicates the task of precisely reconstructing scene geometry. Such challenges often lead to diminished quality in reconstructions and extended durations for both training and rendering. To tackle these challenges, we present Lightning NeRF. It uses an efficient hybrid scene representation that effectively utilizes the geometry prior from LiDAR in autonomous driving scenarios. Lightning NeRF significantly improves the novel view synthesis performance of NeRF and reduces computational overheads. Through evaluations on real-world datasets, such as KITTI-360, Argoverse2, and our private dataset, we demonstrate that our approach not only exceeds the current state-of-the-art in novel view synthesis quality but also achieves a five-fold increase in training speed and a ten-fold improvement in rendering speed. Codes are available at https://github.com/VISION-SJTU/Lightning-NeRF .
△ Less
Submitted 9 March, 2024;
originally announced March 2024.
-
Formal Verification of Unknown Stochastic Systems via Non-parametric Estimation
Authors:
Zhi Zhang,
Chenyu Ma,
Saleh Soudijani,
Sadegh Soudjani
Abstract:
A novel data-driven method for formal verification is proposed to study complex systems operating in safety-critical domains. The proposed approach is able to formally verify discrete-time stochastic dynamical systems against temporal logic specifications only using observation samples and without the knowledge of the model, and provide a probabilistic guarantee on the satisfaction of the specific…
▽ More
A novel data-driven method for formal verification is proposed to study complex systems operating in safety-critical domains. The proposed approach is able to formally verify discrete-time stochastic dynamical systems against temporal logic specifications only using observation samples and without the knowledge of the model, and provide a probabilistic guarantee on the satisfaction of the specification. We first propose the theoretical results for using non-parametric estimation to estimate an asymptotic upper bound for the \emph{Lipschitz constant} of the stochastic system, which can determine a finite abstraction of the system. Our results prove that the asymptotic convergence rate of the estimation is $O(n^{-\frac{1}{3+d}})$, where $d$ is the dimension of the system and $n$ is the data scale. We then construct interval Markov decision processes using two different data-driven methods, namely non-parametric estimation and empirical estimation of transition probabilities, to perform formal verification against a given temporal logic specification. Multiple case studies are presented to validate the effectiveness of the proposed methods.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
Tapilot-Crossing: Benchmarking and Evolving LLMs Towards Interactive Data Analysis Agents
Authors:
**yang Li,
Nan Huo,
Yan Gao,
Jiayi Shi,
Yingxiu Zhao,
Ge Qu,
Yurong Wu,
Chenhao Ma,
Jian-Guang Lou,
Reynold Cheng
Abstract:
Interactive Data Analysis, the collaboration between humans and LLM agents, enables real-time data exploration for informed decision-making. The challenges and costs of collecting realistic interactive logs for data analysis hinder the quantitative evaluation of Large Language Model (LLM) agents in this task. To mitigate this issue, we introduce Tapilot-Crossing, a new benchmark to evaluate LLM ag…
▽ More
Interactive Data Analysis, the collaboration between humans and LLM agents, enables real-time data exploration for informed decision-making. The challenges and costs of collecting realistic interactive logs for data analysis hinder the quantitative evaluation of Large Language Model (LLM) agents in this task. To mitigate this issue, we introduce Tapilot-Crossing, a new benchmark to evaluate LLM agents on interactive data analysis. Tapilot-Crossing contains 1024 interactions, covering 4 practical scenarios: Normal, Action, Private, and Private Action. Notably, Tapilot-Crossing is constructed by an economical multi-agent environment, Decision Company, with few human efforts. We evaluate popular and advanced LLM agents in Tapilot-Crossing, which underscores the challenges of interactive data analysis. Furthermore, we propose Adaptive Interaction Reflection (AIR), a self-generated reflection strategy that guides LLM agents to learn from successful history. Experiments demonstrate that Air can evolve LLMs into effective interactive data analysis agents, achieving a relative performance improvement of up to 44.5%.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
Observation of the decay $h_{c}\to3(π^{+}π^{-})π^{0}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (639 additional authors not shown)
Abstract:
Based on $(2712.4\pm14.1)\times10^{6}$ $ψ(3686)$ events collected with the BESIII detector, we study the decays $h_{c}\to3(π^{+}π^{-})π^{0}$, $h_{c}\to2(π^{+}π^{-})ω$, $h_{c}\to2(π^{+}π^{-})π^{0}η$, $h_{c}\to2(π^{+}π^{-})η$, and $h_{c}\to p\bar{p}$ via $ψ(3686)\toπ^{0}h_{c}$. The decay channel $h_{c}\to3(π^{+}π^{-})π^{0}$ is observed for the first time, and its branching fraction is determined to…
▽ More
Based on $(2712.4\pm14.1)\times10^{6}$ $ψ(3686)$ events collected with the BESIII detector, we study the decays $h_{c}\to3(π^{+}π^{-})π^{0}$, $h_{c}\to2(π^{+}π^{-})ω$, $h_{c}\to2(π^{+}π^{-})π^{0}η$, $h_{c}\to2(π^{+}π^{-})η$, and $h_{c}\to p\bar{p}$ via $ψ(3686)\toπ^{0}h_{c}$. The decay channel $h_{c}\to3(π^{+}π^{-})π^{0}$ is observed for the first time, and its branching fraction is determined to be $\left( {9.28\pm 1.14 \pm 0.77} \right) \times {10^{ - 3}}$, where the first uncertainty is statistical and the second is systematic. In addition, first evidence is found for the modes $h_{c} \to 2(π^{+}π^{-})π^{0}η$ and $h_{c}\to2(π^{+}π^{-})ω$ with significances of 4.8$σ$ and 4.7$σ$, and their branching fractions are determined to be $(7.55\pm1.51\pm0.77)\times10^{-3}$ and $\left( {4.00 \pm 0.86 \pm 0.35}\right) \times {10^{ - 3}}$, respectively. No significant signals of $h_c\to 2(π^+π^-)η$ and $h_{c}\to p\bar{p}$ are observed, and the upper limits of the branching fractions of these decays are determined to be $<6.19\times10^{-4}$ and $<4.40\times10^{-5}$ at the 90% confidence level, respectively.
△ Less
Submitted 6 March, 2024;
originally announced March 2024.
-
"In Dialogues We Learn": Towards Personalized Dialogue Without Pre-defined Profiles through In-Dialogue Learning
Authors:
Chuanqi Cheng,
Quan Tu,
Wei Wu,
Shuo Shang,
Cunli Mao,
Zhengtao Yu,
Rui Yan
Abstract:
Personalized dialogue systems have gained significant attention in recent years for their ability to generate responses in alignment with different personas. However, most existing approaches rely on pre-defined personal profiles, which are not only time-consuming and labor-intensive to create but also lack flexibility. We propose In-Dialogue Learning (IDL), a fine-tuning framework that enhances t…
▽ More
Personalized dialogue systems have gained significant attention in recent years for their ability to generate responses in alignment with different personas. However, most existing approaches rely on pre-defined personal profiles, which are not only time-consuming and labor-intensive to create but also lack flexibility. We propose In-Dialogue Learning (IDL), a fine-tuning framework that enhances the ability of pre-trained large language models to leverage dialogue history to characterize persona for completing personalized dialogue generation tasks without pre-defined profiles. Our experiments on three datasets demonstrate that IDL brings substantial improvements, with BLEU and ROUGE scores increasing by up to 200% and 247%, respectively. Additionally, the results of human evaluations further validate the efficacy of our proposed method.
△ Less
Submitted 12 March, 2024; v1 submitted 5 March, 2024;
originally announced March 2024.
-
HoloVIC: Large-scale Dataset and Benchmark for Multi-Sensor Holographic Intersection and Vehicle-Infrastructure Cooperative
Authors:
Cong Ma,
Lei Qiao,
Chengkai Zhu,
Kai Liu,
Zelong Kong,
Qing Li,
Xueqi Zhou,
Yuheng Kan,
Wei Wu
Abstract:
Vehicle-to-everything (V2X) is a popular topic in the field of Autonomous Driving in recent years. Vehicle-infrastructure cooperation (VIC) becomes one of the important research area. Due to the complexity of traffic conditions such as blind spots and occlusion, it greatly limits the perception capabilities of single-view roadside sensing systems. To further enhance the accuracy of roadside percep…
▽ More
Vehicle-to-everything (V2X) is a popular topic in the field of Autonomous Driving in recent years. Vehicle-infrastructure cooperation (VIC) becomes one of the important research area. Due to the complexity of traffic conditions such as blind spots and occlusion, it greatly limits the perception capabilities of single-view roadside sensing systems. To further enhance the accuracy of roadside perception and provide better information to the vehicle side, in this paper, we constructed holographic intersections with various layouts to build a large-scale multi-sensor holographic vehicle-infrastructure cooperation dataset, called HoloVIC. Our dataset includes 3 different types of sensors (Camera, Lidar, Fisheye) and employs 4 sensor-layouts based on the different intersections. Each intersection is equipped with 6-18 sensors to capture synchronous data. While autonomous vehicles pass through these intersections for collecting VIC data. HoloVIC contains in total on 100k+ synchronous frames from different sensors. Additionally, we annotated 3D bounding boxes based on Camera, Fisheye, and Lidar. We also associate the IDs of the same objects across different devices and consecutive frames in sequence. Based on HoloVIC, we formulated four tasks to facilitate the development of related research. We also provide benchmarks for these tasks.
△ Less
Submitted 26 March, 2024; v1 submitted 4 March, 2024;
originally announced March 2024.
-
Observation of $ψ(3686)\to 3φ$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (645 additional authors not shown)
Abstract:
Using $(2.712\pm0.014)\times 10^9$ $ψ(3686)$ events collected by the BESIII detector operating at the BEPCII collider, we report the first observation of $ψ(3686)\to 3φ$ decay with a significance larger than 10$σ$. The branching fraction of this decay is determined to be $(1.46\pm0.05\pm0.17)\times10^{-5}$, where the first uncertainty is statistical and the second is systematic. No significant str…
▽ More
Using $(2.712\pm0.014)\times 10^9$ $ψ(3686)$ events collected by the BESIII detector operating at the BEPCII collider, we report the first observation of $ψ(3686)\to 3φ$ decay with a significance larger than 10$σ$. The branching fraction of this decay is determined to be $(1.46\pm0.05\pm0.17)\times10^{-5}$, where the first uncertainty is statistical and the second is systematic. No significant structure is observed in the $φφ$ invariant mass spectra.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Batched Nonparametric Contextual Bandits
Authors:
Rong Jiang,
Cong Ma
Abstract:
We study nonparametric contextual bandits under batch constraints, where the expected reward for each action is modeled as a smooth function of covariates, and the policy updates are made at the end of each batch of observations. We establish a minimax regret lower bound for this setting and propose a novel batch learning algorithm that achieves the optimal regret (up to logarithmic factors). In e…
▽ More
We study nonparametric contextual bandits under batch constraints, where the expected reward for each action is modeled as a smooth function of covariates, and the policy updates are made at the end of each batch of observations. We establish a minimax regret lower bound for this setting and propose a novel batch learning algorithm that achieves the optimal regret (up to logarithmic factors). In essence, our procedure dynamically splits the covariate space into smaller bins, carefully aligning their widths with the batch size. Our theoretical results suggest that for nonparametric contextual bandits, a nearly constant number of policy updates can attain optimal regret in the fully online setting.
△ Less
Submitted 10 June, 2024; v1 submitted 27 February, 2024;
originally announced February 2024.
-
Scaling Supervised Local Learning with Augmented Auxiliary Networks
Authors:
Chenxiang Ma,
Jibin Wu,
Chenyang Si,
Kay Chen Tan
Abstract:
Deep neural networks are typically trained using global error signals that backpropagate (BP) end-to-end, which is not only biologically implausible but also suffers from the update locking problem and requires huge memory consumption. Local learning, which updates each layer independently with a gradient-isolated auxiliary network, offers a promising alternative to address the above problems. How…
▽ More
Deep neural networks are typically trained using global error signals that backpropagate (BP) end-to-end, which is not only biologically implausible but also suffers from the update locking problem and requires huge memory consumption. Local learning, which updates each layer independently with a gradient-isolated auxiliary network, offers a promising alternative to address the above problems. However, existing local learning methods are confronted with a large accuracy gap with the BP counterpart, particularly for large-scale networks. This is due to the weak coupling between local layers and their subsequent network layers, as there is no gradient communication across layers. To tackle this issue, we put forward an augmented local learning method, dubbed AugLocal. AugLocal constructs each hidden layer's auxiliary network by uniformly selecting a small subset of layers from its subsequent network layers to enhance their synergy. We also propose to linearly reduce the depth of auxiliary networks as the hidden layer goes deeper, ensuring sufficient network capacity while reducing the computational cost of auxiliary networks. Our extensive experiments on four image classification datasets (i.e., CIFAR-10, SVHN, STL-10, and ImageNet) demonstrate that AugLocal can effectively scale up to tens of local layers with a comparable accuracy to BP-trained networks while reducing GPU memory usage by around 40%. The proposed AugLocal method, therefore, opens up a myriad of opportunities for training high-performance deep neural networks on resource-constrained platforms.Code is available at https://github.com/ChenxiangMA/AugLocal.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
Efficient Online Learning for Networks of Two-Compartment Spiking Neurons
Authors:
Yujia Yin,
Xinyi Chen,
Chenxiang Ma,
Jibin Wu,
Kay Chen Tan
Abstract:
The brain-inspired Spiking Neural Networks (SNNs) have garnered considerable research interest due to their superior performance and energy efficiency in processing temporal signals. Recently, a novel multi-compartment spiking neuron model, namely the Two-Compartment LIF (TC-LIF) model, has been proposed and exhibited a remarkable capacity for sequential modelling. However, training the TC-LIF mod…
▽ More
The brain-inspired Spiking Neural Networks (SNNs) have garnered considerable research interest due to their superior performance and energy efficiency in processing temporal signals. Recently, a novel multi-compartment spiking neuron model, namely the Two-Compartment LIF (TC-LIF) model, has been proposed and exhibited a remarkable capacity for sequential modelling. However, training the TC-LIF model presents challenges stemming from the large memory consumption and the issue of gradient vanishing associated with the Backpropagation Through Time (BPTT) algorithm. To address these challenges, online learning methodologies emerge as a promising solution. Yet, to date, the application of online learning methods in SNNs has been predominantly confined to simplified Leaky Integrate-and-Fire (LIF) neuron models. In this paper, we present a novel online learning method specifically tailored for networks of TC-LIF neurons. Additionally, we propose a refined TC-LIF neuron model called Adaptive TC-LIF, which is carefully designed to enhance temporal information integration in online learning scenarios. Extensive experiments, conducted on various sequential benchmarks, demonstrate that our approach successfully preserves the superior sequential modeling capabilities of the TC-LIF neuron while incorporating the training efficiency and hardware friendliness of online learning. As a result, it offers a multitude of opportunities to leverage neuromorphic solutions for processing temporal signals.
△ Less
Submitted 24 February, 2024;
originally announced February 2024.
-
Empowering Large Language Model Agents through Action Learning
Authors:
Haiteng Zhao,
Chang Ma,
Guoyin Wang,
**g Su,
Lingpeng Kong,
**g**g Xu,
Zhi-Hong Deng,
Hongxia Yang
Abstract:
Large Language Model (LLM) Agents have recently garnered increasing interest yet they are limited in their ability to learn from trial and error, a key element of intelligent behavior. In this work, we argue that the capacity to learn new actions from experience is fundamental to the advancement of learning in LLM agents. While humans naturally expand their action spaces and develop skills through…
▽ More
Large Language Model (LLM) Agents have recently garnered increasing interest yet they are limited in their ability to learn from trial and error, a key element of intelligent behavior. In this work, we argue that the capacity to learn new actions from experience is fundamental to the advancement of learning in LLM agents. While humans naturally expand their action spaces and develop skills through experiential learning, LLM agents typically operate within fixed action spaces, limiting their potential for growth. To address these challenges, our study explores open-action learning for language agents. We introduce a framework LearnAct with an iterative learning strategy to create and improve actions in the form of Python functions. In each iteration, LLM revises and updates the currently available actions based on the errors identified in unsuccessful training tasks, thereby enhancing action effectiveness. Our experimental evaluations across Robotic Planning and Alfworld environments reveal that after learning on a few training task instances, our approach to open-action learning markedly improves agent performance for the type of task (by 32 percent in AlfWorld compared to ReAct+Reflexion, for instance) highlighting the importance of experiential action learning in the development of more intelligent LLM agents.
△ Less
Submitted 24 February, 2024;
originally announced February 2024.
-
On Schrödingerization based quantum algorithms for linear dynamical systems with inhomogeneous terms
Authors:
Shi **,
Nana Liu,
Chuwen Ma
Abstract:
We analyze the Schrödingerisation method for quantum simulation of a general class of non-unitary dynamics with inhomogeneous source terms. The Schrödingerisation technique, introduced in \cite{JLY22a,JLY23}, transforms any linear ordinary and partial differential equations with non-unitary dynamics into a system under unitary dynamics via a warped phase transition that maps the equations into a h…
▽ More
We analyze the Schrödingerisation method for quantum simulation of a general class of non-unitary dynamics with inhomogeneous source terms. The Schrödingerisation technique, introduced in \cite{JLY22a,JLY23}, transforms any linear ordinary and partial differential equations with non-unitary dynamics into a system under unitary dynamics via a warped phase transition that maps the equations into a higher dimension, making them suitable for quantum simulation. This technique can also be applied to these equations with inhomogeneous terms modeling source or forcing terms or boundary and interface conditions, and discrete dynamical systems such as iterative methods in numerical linear algebra, through extra equations in the system. Difficulty airses with the presense of inhomogeneous terms since it can change the stability of the original system.
In this paper, we systematically study--both theoretically and numerically--the important issue of recovering the original variables from the Schrödingerized equations, even when the evolution operator contains unstable modes. We show that even with unstable modes, one can still construct a stable scheme, yet to recover the original variable one needs to use suitable data in the extended space. We analyze and compare both the discrete and continuous Fourier transforms used in the extended dimension, and derive corresponding error estimates, which allows one to use the more appropriate transform for specific equations. We also provide a smoother initialization for the Schrodödingerized system to gain higher order accuracy in the extended space. We homogenize the inhomogeneous terms with a stretch transformation, making it easier to recover the original variable. Our recovering technique also provides a simple and generic framework to solve general ill-posed problems in a computationally stable way.
△ Less
Submitted 27 February, 2024; v1 submitted 22 February, 2024;
originally announced February 2024.
-
Broadband noise and quasi-periodic oscillation characteristics of the X-ray pulsar RX J0440.9+4431
Authors:
P. P. Li,
L. Tao,
R. C. Ma,
M. Y. Ge,
Q. C. Zhao,
S. J. Zhao,
L. Zhang,
Q. C. Bu,
L. D. Kong,
Y. L. Tuo,
L. Ji,
S. Zhang,
J. L. Qu,
S. N. Zhang,
Y. Huang,
X. Ma,
W. T. Ye,
Q. C. Shui
Abstract:
We present a comprehensive timing analysis on the Be/X-ray binary pulsar RX J0440.9+4431 using observations from \textit{NICER} and \textit{Insight}-HXMT during the 2022--2023 outburst. The power density spectrum (PDS) of RX J0440.9+4431 exhibits typical aperiodic variability in X-ray flux across a wide frequency range. During a super-critical accretion state, we detect quasi-periodic oscillations…
▽ More
We present a comprehensive timing analysis on the Be/X-ray binary pulsar RX J0440.9+4431 using observations from \textit{NICER} and \textit{Insight}-HXMT during the 2022--2023 outburst. The power density spectrum (PDS) of RX J0440.9+4431 exhibits typical aperiodic variability in X-ray flux across a wide frequency range. During a super-critical accretion state, we detect quasi-periodic oscillations (QPOs) at 0.2--0.5\,Hz in the light curves of five pulses for RX J0440.9+4431. The observed QPOs manifest during flares, while the flares appear at the peaks of the pulse profiles on a timescale of seconds and are primarily caused by an increase in hard photons. These flares can be explained by increased material ingestion in the accretion column at a fixed phase, primarily generating hard photons. Alternatively, an increase in accretion rate, independent of phase, may result in highly beamed hard photons within the accretion column, causing the flares. We argue the origin of QPOs to instabilities within the accretion flow. Additionally, we find that the break frequencies in the noise power spectra align well with $\propto L_{\mathrm{x}}^{3 / 7}$ across three orders of magnitude in the luminosity, which points to a relatively strong magnetic field in RX J0440.9+4431, estimated to be \textasciitilde$10^{13}$\,G.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
Music Style Transfer with Time-Varying Inversion of Diffusion Models
Authors:
Sifei Li,
Yuxin Zhang,
Fan Tang,
Chongyang Ma,
Weiming dong,
Changsheng Xu
Abstract:
With the development of diffusion models, text-guided image style transfer has demonstrated high-quality controllable synthesis results. However, the utilization of text for diverse music style transfer poses significant challenges, primarily due to the limited availability of matched audio-text datasets. Music, being an abstract and complex art form, exhibits variations and intricacies even withi…
▽ More
With the development of diffusion models, text-guided image style transfer has demonstrated high-quality controllable synthesis results. However, the utilization of text for diverse music style transfer poses significant challenges, primarily due to the limited availability of matched audio-text datasets. Music, being an abstract and complex art form, exhibits variations and intricacies even within the same genre, thereby making accurate textual descriptions challenging. This paper presents a music style transfer approach that effectively captures musical attributes using minimal data. We introduce a novel time-varying textual inversion module to precisely capture mel-spectrogram features at different levels. During inference, we propose a bias-reduced stylization technique to obtain stable results. Experimental results demonstrate that our method can transfer the style of specific instruments, as well as incorporate natural sounds to compose melodies. Samples and source code are available at https://lsfhuihuiff.github.io/MusicTI/.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
Incentive Compatibility for AI Alignment in Sociotechnical Systems: Positions and Prospects
Authors:
Zhaowei Zhang,
Fengshuo Bai,
Mingzhi Wang,
Haoyang Ye,
Chengdong Ma,
Yaodong Yang
Abstract:
The burgeoning integration of artificial intelligence (AI) into human society brings forth significant implications for societal governance and safety. While considerable strides have been made in addressing AI alignment challenges, existing methodologies primarily focus on technical facets, often neglecting the intricate sociotechnical nature of AI systems, which can lead to a misalignment betwee…
▽ More
The burgeoning integration of artificial intelligence (AI) into human society brings forth significant implications for societal governance and safety. While considerable strides have been made in addressing AI alignment challenges, existing methodologies primarily focus on technical facets, often neglecting the intricate sociotechnical nature of AI systems, which can lead to a misalignment between the development and deployment contexts. To this end, we posit a new problem worth exploring: Incentive Compatibility Sociotechnical Alignment Problem (ICSAP). We hope this can call for more researchers to explore how to leverage the principles of Incentive Compatibility (IC) from game theory to bridge the gap between technical and societal components to maintain AI consensus with human societies in different contexts. We further discuss three classical game problems for achieving IC: mechanism design, contract theory, and Bayesian persuasion, in addressing the perspectives, potentials, and challenges of solving ICSAP, and provide preliminary implementation conceptions.
△ Less
Submitted 1 March, 2024; v1 submitted 20 February, 2024;
originally announced February 2024.
-
Search for the production of deuterons and antideuterons in e^+e^- annihilation at center-of-mass energies between 4.13 and 4.70 GeV
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann
, et al. (593 additional authors not shown)
Abstract:
Using a data sample of $e^+e^-$ collision data corresponding to an integrated luminosity of 19 fb$^{-1}$ collected with the BESIII detector at the BEPCII collider, we search for the production of deuterons and antideuterons via $e^+e^-\to ppπ^-\bar{d}+c.c.$ for the first time at center-of-mass energies between 4.13 and 4.70 GeV. No significant signal is observed and the upper limit of the…
▽ More
Using a data sample of $e^+e^-$ collision data corresponding to an integrated luminosity of 19 fb$^{-1}$ collected with the BESIII detector at the BEPCII collider, we search for the production of deuterons and antideuterons via $e^+e^-\to ppπ^-\bar{d}+c.c.$ for the first time at center-of-mass energies between 4.13 and 4.70 GeV. No significant signal is observed and the upper limit of the $e^+e^-\to ppπ^-\bar{d}+c.c.$ cross section is determined to be from 9.0 to 145 fb depending on the center-of-mass energy at the $90\%$ confidence level.
△ Less
Submitted 17 February, 2024;
originally announced February 2024.
-
Weak-Mamba-UNet: Visual Mamba Makes CNN and ViT Work Better for Scribble-based Medical Image Segmentation
Authors:
Ziyang Wang,
Chao Ma
Abstract:
Medical image segmentation is increasingly reliant on deep learning techniques, yet the promising performance often come with high annotation costs. This paper introduces Weak-Mamba-UNet, an innovative weakly-supervised learning (WSL) framework that leverages the capabilities of Convolutional Neural Network (CNN), Vision Transformer (ViT), and the cutting-edge Visual Mamba (VMamba) architecture fo…
▽ More
Medical image segmentation is increasingly reliant on deep learning techniques, yet the promising performance often come with high annotation costs. This paper introduces Weak-Mamba-UNet, an innovative weakly-supervised learning (WSL) framework that leverages the capabilities of Convolutional Neural Network (CNN), Vision Transformer (ViT), and the cutting-edge Visual Mamba (VMamba) architecture for medical image segmentation, especially when dealing with scribble-based annotations. The proposed WSL strategy incorporates three distinct architecture but same symmetrical encoder-decoder networks: a CNN-based UNet for detailed local feature extraction, a Swin Transformer-based SwinUNet for comprehensive global context understanding, and a VMamba-based Mamba-UNet for efficient long-range dependency modeling. The key concept of this framework is a collaborative and cross-supervisory mechanism that employs pseudo labels to facilitate iterative learning and refinement across the networks. The effectiveness of Weak-Mamba-UNet is validated on a publicly available MRI cardiac segmentation dataset with processed scribble annotations, where it surpasses the performance of a similar WSL framework utilizing only UNet or SwinUNet. This highlights its potential in scenarios with sparse or imprecise annotations. The source code is made publicly accessible.
△ Less
Submitted 16 February, 2024;
originally announced February 2024.
-
Multi-Intent Attribute-Aware Text Matching in Searching
Authors:
Mingzhe Li,
Xiuying Chen,
**g Xiang,
Qishen Zhang,
Changsheng Ma,
Chenchen Dai,
**xiong Chang,
Zhongyi Liu,
Guannan Zhang
Abstract:
Text matching systems have become a fundamental service in most searching platforms. For instance, they are responsible for matching user queries to relevant candidate items, or rewriting the user-input query to a pre-selected high-performing one for a better search experience. In practice, both the queries and items often contain multiple attributes, such as the category of the item and the locat…
▽ More
Text matching systems have become a fundamental service in most searching platforms. For instance, they are responsible for matching user queries to relevant candidate items, or rewriting the user-input query to a pre-selected high-performing one for a better search experience. In practice, both the queries and items often contain multiple attributes, such as the category of the item and the location mentioned in the query, which represent condensed key information that is helpful for matching. However, most of the existing works downplay the effectiveness of attributes by integrating them into text representations as supplementary information. Hence, in this work, we focus on exploring the relationship between the attributes from two sides. Since attributes from two ends are often not aligned in terms of number and type, we propose to exploit the benefit of attributes by multiple-intent modeling. The intents extracted from attributes summarize the diverse needs of queries and provide rich content of items, which are more refined and abstract, and can be aligned for paired inputs. Concretely, we propose a multi-intent attribute-aware matching model (MIM), which consists of three main components: attribute-aware encoder, multi-intent modeling, and intent-aware matching. In the attribute-aware encoder, the text and attributes are weighted and processed through a scaled attention mechanism with regard to the attributes' importance. Afterward, the multi-intent modeling extracts intents from two ends and aligns them. Herein, we come up with a distribution loss to ensure the learned intents are diverse but concentrated, and a kullback-leibler divergence loss that aligns the learned intents. Finally, in the intent-aware matching, the intents are evaluated by a self-supervised masking task, and then incorporated to output the final matching result.
△ Less
Submitted 12 February, 2024;
originally announced February 2024.
-
Top-$K$ ranking with a monotone adversary
Authors:
Yuepeng Yang,
Antares Chen,
Lorenzo Orecchia,
Cong Ma
Abstract:
In this paper, we address the top-$K$ ranking problem with a monotone adversary. We consider the scenario where a comparison graph is randomly generated and the adversary is allowed to add arbitrary edges. The statistician's goal is then to accurately identify the top-$K$ preferred items based on pairwise comparisons derived from this semi-random comparison graph. The main contribution of this pap…
▽ More
In this paper, we address the top-$K$ ranking problem with a monotone adversary. We consider the scenario where a comparison graph is randomly generated and the adversary is allowed to add arbitrary edges. The statistician's goal is then to accurately identify the top-$K$ preferred items based on pairwise comparisons derived from this semi-random comparison graph. The main contribution of this paper is to develop a weighted maximum likelihood estimator (MLE) that achieves near-optimal sample complexity, up to a $\log^2(n)$ factor, where $n$ denotes the number of items under comparison. This is made possible through a combination of analytical and algorithmic innovations. On the analytical front, we provide a refined~$\ell_\infty$ error analysis of the weighted MLE that is more explicit and tighter than existing analyses. It relates the~$\ell_\infty$ error with the spectral properties of the weighted comparison graph. Motivated by this, our algorithmic innovation involves the development of an SDP-based approach to reweight the semi-random graph and meet specified spectral properties. Additionally, we propose a first-order method based on the Matrix Multiplicative Weight Update (MMWU) framework. This method efficiently solves the resulting SDP in nearly-linear time relative to the size of the semi-random comparison graph.
△ Less
Submitted 20 June, 2024; v1 submitted 12 February, 2024;
originally announced February 2024.
-
Semi-Mamba-UNet: Pixel-Level Contrastive and Pixel-Level Cross-Supervised Visual Mamba-based UNet for Semi-Supervised Medical Image Segmentation
Authors:
Chao Ma,
Ziyang Wang
Abstract:
Medical image segmentation is essential in diagnostics, treatment planning, and healthcare, with deep learning offering promising advancements. Notably, Convolutional Neural Network (CNN) excel in capturing local image features, whereas Vision Transformer (ViT) adeptly model long-range dependencies through multi-head self-attention mechanisms. Despite their strengths, both CNN and ViT face challen…
▽ More
Medical image segmentation is essential in diagnostics, treatment planning, and healthcare, with deep learning offering promising advancements. Notably, Convolutional Neural Network (CNN) excel in capturing local image features, whereas Vision Transformer (ViT) adeptly model long-range dependencies through multi-head self-attention mechanisms. Despite their strengths, both CNN and ViT face challenges in efficiently processing long-range dependencies within medical images, often requiring substantial computational resources. This issue, combined with the high cost and limited availability of expert annotations, poses significant obstacles to achieving precise segmentation. To address these challenges, this paper introduces the Semi-Mamba-UNet, which integrates a visual mamba-based UNet architecture with a conventional UNet into a semi-supervised learning (SSL) framework. This innovative SSL approach leverages dual networks to jointly generate pseudo labels and cross supervise each other, drawing inspiration from consistency regularization techniques. Furthermore, we introduce a self-supervised pixel-level contrastive learning strategy, employing a projector pair to further enhance feature learning capabilities. Our comprehensive evaluation on a publicly available MRI cardiac segmentation dataset, comparing against various SSL frameworks with different UNet-based segmentation networks, highlights the superior performance of Semi-Mamba-UNet. The source code has been made publicly accessible.
△ Less
Submitted 29 March, 2024; v1 submitted 11 February, 2024;
originally announced February 2024.
-
Weather Prediction with Diffusion Guided by Realistic Forecast Processes
Authors:
Zhanxiang Hua,
Yutong He,
Chengqian Ma,
Alexandra Anderson-Frey
Abstract:
Weather forecasting remains a crucial yet challenging domain, where recently developed models based on deep learning (DL) have approached the performance of traditional numerical weather prediction (NWP) models. However, these DL models, often complex and resource-intensive, face limitations in flexibility post-training and in incorporating NWP predictions, leading to reliability concerns due to p…
▽ More
Weather forecasting remains a crucial yet challenging domain, where recently developed models based on deep learning (DL) have approached the performance of traditional numerical weather prediction (NWP) models. However, these DL models, often complex and resource-intensive, face limitations in flexibility post-training and in incorporating NWP predictions, leading to reliability concerns due to potential unphysical predictions. In response, we introduce a novel method that applies diffusion models (DM) for weather forecasting. In particular, our method can achieve both direct and iterative forecasting with the same modeling framework. Our model is not only capable of generating forecasts independently but also uniquely allows for the integration of NWP predictions, even with varying lead times, during its sampling process. The flexibility and controllability of our model empowers a more trustworthy DL system for the general weather community. Additionally, incorporating persistence and climatology data further enhances our model's long-term forecasting stability. Our empirical findings demonstrate the feasibility and generalizability of this approach, suggesting a promising direction for future, more sophisticated diffusion models without the need for retraining.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
The Essential Role of Causality in Foundation World Models for Embodied AI
Authors:
Tarun Gupta,
Wenbo Gong,
Chao Ma,
Nick Pawlowski,
Agrin Hilmkil,
Meyer Scetbon,
Marc Rigter,
Ade Famoti,
Ashley Juan Llorens,
Jianfeng Gao,
Stefan Bauer,
Danica Kragic,
Bernhard Schölkopf,
Cheng Zhang
Abstract:
Recent advances in foundation models, especially in large multi-modal models and conversational agents, have ignited interest in the potential of generally capable embodied agents. Such agents will require the ability to perform new tasks in many different real-world environments. However, current foundation models fail to accurately model physical interactions and are therefore insufficient for E…
▽ More
Recent advances in foundation models, especially in large multi-modal models and conversational agents, have ignited interest in the potential of generally capable embodied agents. Such agents will require the ability to perform new tasks in many different real-world environments. However, current foundation models fail to accurately model physical interactions and are therefore insufficient for Embodied AI. The study of causality lends itself to the construction of veridical world models, which are crucial for accurately predicting the outcomes of possible interactions. This paper focuses on the prospects of building foundation world models for the upcoming generation of embodied agents and presents a novel viewpoint on the significance of causality within these. We posit that integrating causal considerations is vital to facilitating meaningful physical interactions with the world. Finally, we demystify misconceptions about causality in this context and present our outlook for future research.
△ Less
Submitted 29 April, 2024; v1 submitted 6 February, 2024;
originally announced February 2024.
-
Fluctuations and Persistence in Quantum Diffusion on Regular Lattices
Authors:
Cheng Ma,
Omar Malik,
G. Korniss
Abstract:
We investigate quantum persistence by analyzing amplitude and phase fluctuations of the wave function governed by the time-dependent free-particle Schrödinger equation. The quantum system is initialized with local random uncorrelated Gaussian amplitude and phase fluctuations. In analogy with classical diffusion, the persistence probability is defined as the probability that the local (amplitude or…
▽ More
We investigate quantum persistence by analyzing amplitude and phase fluctuations of the wave function governed by the time-dependent free-particle Schrödinger equation. The quantum system is initialized with local random uncorrelated Gaussian amplitude and phase fluctuations. In analogy with classical diffusion, the persistence probability is defined as the probability that the local (amplitude or phase) fluctuations have not changed sign up to time $t$. Our results show that the persistence probability in quantum diffusion exhibits exponential-like tails. More specifically, in $d=1$ the persistence probability decays in a stretched exponential fashion, while in $d=2$ and $d=3$ as an exponential. We also provide some insights by analyzing the two-point spatial and temporal correlation functions in the limit of small fluctuations. In particular, in the long-time limit, the temporal correlation functions for both local amplitude and phase fluctuations become time-homogeneous, i.e., the zero-crossing events correspond to those of a stationary Gaussian process, with sufficiently fast-decaying power-law tail of its autocorrelation function, implying an exponential-like tail of the persistence probabilities.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
Scalable Algorithm for Finding Balanced Subgraphs with Tolerance in Signed Networks
Authors:
**gbang Chen,
Qiuyang Mang,
Hangrui Zhou,
Richard Peng,
Yu Gao,
Chenhao Ma
Abstract:
Signed networks, characterized by edges labeled as either positive or negative, offer nuanced insights into interaction dynamics beyond the capabilities of unsigned graphs. Central to this is the task of identifying the maximum balanced subgraph, crucial for applications like polarized community detection in social networks and portfolio analysis in finance. Traditional models, however, are limite…
▽ More
Signed networks, characterized by edges labeled as either positive or negative, offer nuanced insights into interaction dynamics beyond the capabilities of unsigned graphs. Central to this is the task of identifying the maximum balanced subgraph, crucial for applications like polarized community detection in social networks and portfolio analysis in finance. Traditional models, however, are limited by an assumption of perfect partitioning, which fails to mirror the complexities of real-world data. Addressing this gap, we introduce an innovative generalized balanced subgraph model that incorporates tolerance for irregularities. Our proposed region-based heuristic algorithm, tailored for this NP-hard problem, strikes a balance between low time complexity and high-quality outcomes. Comparative experiments validate its superior performance against leading solutions, delivering enhanced effectiveness (notably larger subgraph sizes) and efficiency (achieving up to 100x speedup) in both traditional and generalized contexts.
△ Less
Submitted 16 June, 2024; v1 submitted 7 February, 2024;
originally announced February 2024.
-
Probing the internal structures of $pΩ$ and $ΩΩ$ with their production at the LHC
Authors:
Jie Pu,
Kai-Jia Sun,
Chun-Wang Ma,
Lie-Wen Chen
Abstract:
The strange dibaryons $pΩ$ ($^5\rm{S}_2$) and $ΩΩ$ ($^1\rm{S}_0$) are likely bound, existing either in molecular states like the deuteron or as more exotic compact six-quark states. Here, we investigate the production of these two dibaryons in Pb+Pb collisions at $\sqrt{s_{NN}}$=2.76 TeV at the CERN Large Hadron Collider (LHC) within a covariant coalescence model, which employs a blast-wave-like p…
▽ More
The strange dibaryons $pΩ$ ($^5\rm{S}_2$) and $ΩΩ$ ($^1\rm{S}_0$) are likely bound, existing either in molecular states like the deuteron or as more exotic compact six-quark states. Here, we investigate the production of these two dibaryons in Pb+Pb collisions at $\sqrt{s_{NN}}$=2.76 TeV at the CERN Large Hadron Collider (LHC) within a covariant coalescence model, which employs a blast-wave-like parametrization for the phase-space configurations of constituent particles at freeze-out. For the molecular states, the $pΩ$ and $ΩΩ$ are produced via $p$-$Ω$ and $Ω$-$Ω$ coalescence, respectively, while for the six-quark states, they are formed through $uudsss$ and $ssssss$ coalescence. We find that the yield ratio $N_{pΩ}/N_Ω$ and $N_{ΩΩ}/N_Ω$ have a distinct centrality dependence between the molecular and multi-quark states, thus offering a promising way for distinguishing the two states. Our results suggest that the measurements of $pΩ$ and $ΩΩ$ production in relativistic heavy-ion collisions can shed light on their internal structures.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
VRMM: A Volumetric Relightable Morphable Head Model
Authors:
Haotian Yang,
Mingwu Zheng,
Chongyang Ma,
Yu-Kun Lai,
Pengfei Wan,
Haibin Huang
Abstract:
In this paper, we introduce the Volumetric Relightable Morphable Model (VRMM), a novel volumetric and parametric facial prior for 3D face modeling. While recent volumetric prior models offer improvements over traditional methods like 3D Morphable Models (3DMMs), they face challenges in model learning and personalized reconstructions. Our VRMM overcomes these by employing a novel training framework…
▽ More
In this paper, we introduce the Volumetric Relightable Morphable Model (VRMM), a novel volumetric and parametric facial prior for 3D face modeling. While recent volumetric prior models offer improvements over traditional methods like 3D Morphable Models (3DMMs), they face challenges in model learning and personalized reconstructions. Our VRMM overcomes these by employing a novel training framework that efficiently disentangles and encodes latent spaces of identity, expression, and lighting into low-dimensional representations. This framework, designed with self-supervised learning, significantly reduces the constraints for training data, making it more feasible in practice. The learned VRMM offers relighting capabilities and encompasses a comprehensive range of expressions. We demonstrate the versatility and effectiveness of VRMM through various applications like avatar generation, facial reconstruction, and animation. Additionally, we address the common issue of overfitting in generative volumetric models with a novel prior-preserving personalization framework based on VRMM. Such an approach enables high-quality 3D face reconstruction from even a single portrait input. Our experiments showcase the potential of VRMM to significantly enhance the field of 3D face modeling.
△ Less
Submitted 8 May, 2024; v1 submitted 6 February, 2024;
originally announced February 2024.
-
Precise Measurement of Born Cross Sections for $e^+e^-\to D\bar{D}$ and Observation of One Structure between $\sqrt{s} = 3.80-4.95$ GeV
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann
, et al. (604 additional authors not shown)
Abstract:
Using data samples collected with the BESIII detector at the BEPCII collider at center-of-mass energies ranging from 3.80 to 4.95 GeV, corresponding to an integrated luminosity of 20 fb$^{-1}$, a measurement of Born cross sections for the $e^+e^-\to D^{0}\bar{D}^{0}$ and $D^{+}D^{-}$ processes is presented with unprecedented precision. By performing a simultaneous fit to the dressed cross sections…
▽ More
Using data samples collected with the BESIII detector at the BEPCII collider at center-of-mass energies ranging from 3.80 to 4.95 GeV, corresponding to an integrated luminosity of 20 fb$^{-1}$, a measurement of Born cross sections for the $e^+e^-\to D^{0}\bar{D}^{0}$ and $D^{+}D^{-}$ processes is presented with unprecedented precision. By performing a simultaneous fit to the dressed cross sections for both processes, one possible new structure around 3.9 GeV/$c^2$ is observed for the first time, in addition to seven known resonances $ψ(3770)$, $ψ(4040)$, $ψ(4160)$, $Y(4230)$, $Y(4360)$, $ψ(4415)$, and $Y(4660)$. These results offer crucial experimental insights into the nature of hadron production in the open charm region.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
Continual Domain Adversarial Adaptation via Double-Head Discriminators
Authors:
Yan Shen,
Zhanghexuan Ji,
Chunwei Ma,
Mingchen Gao
Abstract:
Domain adversarial adaptation in a continual setting poses a significant challenge due to the limitations on accessing previous source domain data. Despite extensive research in continual learning, the task of adversarial adaptation cannot be effectively accomplished using only a small number of stored source domain data, which is a standard setting in memory replay approaches. This limitation ari…
▽ More
Domain adversarial adaptation in a continual setting poses a significant challenge due to the limitations on accessing previous source domain data. Despite extensive research in continual learning, the task of adversarial adaptation cannot be effectively accomplished using only a small number of stored source domain data, which is a standard setting in memory replay approaches. This limitation arises from the erroneous empirical estimation of $\gH$-divergence with few source domain samples. To tackle this problem, we propose a double-head discriminator algorithm, by introducing an addition source-only domain discriminator that are trained solely on source learning phase. We prove that with the introduction of a pre-trained source-only domain discriminator, the empirical estimation error of $\gH$-divergence related adversarial loss is reduced from the source domain side. Further experiments on existing domain adaptation benchmark show that our proposed algorithm achieves more than 2$\%$ improvement on all categories of target domain adaptation task while significantly mitigating the forgetting on source domain.
△ Less
Submitted 5 February, 2024;
originally announced February 2024.