-
Measurement of absolute branching fractions of $D_s^+$ hadronic decays
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (632 additional authors not shown)
Abstract:
Using $e^+ e^-$ collision data collected at the BESIII detector at center-of-mass energies between 4.128 and 4.226 GeV, corresponding to an integrated luminosity of $7.33~{\rm fb}^{-1}$, we determine the absolute branching fractions of fifteen hadronic $D_s^{+}$ decays with a double-tag technique. In particular, we make precise measurements of the branching fractions…
▽ More
Using $e^+ e^-$ collision data collected at the BESIII detector at center-of-mass energies between 4.128 and 4.226 GeV, corresponding to an integrated luminosity of $7.33~{\rm fb}^{-1}$, we determine the absolute branching fractions of fifteen hadronic $D_s^{+}$ decays with a double-tag technique. In particular, we make precise measurements of the branching fractions $\mathcal{B}(D_s^+ \to K^+ K^- π^+)=(5.49 \pm 0.04 \pm 0.07)\%$, $\mathcal{B}(D_s^+ \to K_S^0 K^+)=(1.50 \pm 0.01 \pm 0.01)\%$ and $\mathcal{B}(D_s^+ \to K^+ K^- π^+ π^0)=(5.50 \pm 0.05 \pm 0.11)\%$, where the first uncertainties are statistical and the second ones are systematic. The \emph{CP} asymmetries in these decays are also measured and all are found to be compatible with zero.
△ Less
Submitted 30 May, 2024; v1 submitted 28 March, 2024;
originally announced March 2024.
-
Genos: General In-Network Unsupervised Intrusion Detection by Rule Extraction
Authors:
Ruoyu Li,
Qing Li,
Yu Zhang,
Dan Zhao,
Xi Xiao,
Yong Jiang
Abstract:
Anomaly-based network intrusion detection systems (A-NIDS) use unsupervised models to detect unforeseen attacks. However, existing A-NIDS solutions suffer from low throughput, lack of interpretability, and high maintenance costs. Recent in-network intelligence (INI) exploits programmable switches to offer line-rate deployment of NIDS. Nevertheless, current in-network NIDS are either model-specific…
▽ More
Anomaly-based network intrusion detection systems (A-NIDS) use unsupervised models to detect unforeseen attacks. However, existing A-NIDS solutions suffer from low throughput, lack of interpretability, and high maintenance costs. Recent in-network intelligence (INI) exploits programmable switches to offer line-rate deployment of NIDS. Nevertheless, current in-network NIDS are either model-specific or only apply to supervised models. In this paper, we propose Genos, a general in-network framework for unsupervised A-NIDS by rule extraction, which consists of a Model Compiler, a Model Interpreter, and a Model Debugger. Specifically, observing benign data are multimodal and usually located in multiple subspaces in the feature space, we utilize a divide-and-conquer approach for model-agnostic rule extraction. In the Model Compiler, we first propose a tree-based clustering algorithm to partition the feature space into subspaces, then design a decision boundary estimation mechanism to approximate the source model in each subspace. The Model Interpreter interprets predictions by important attributes to aid network operators in understanding the predictions. The Model Debugger conducts incremental updating to rectify errors by only fine-tuning rules on affected subspaces, thus reducing maintenance costs. We implement a prototype using physical hardware, and experiments demonstrate its superior performance of 100 Gbps throughput, great interpretability, and trivial updating overhead.
△ Less
Submitted 28 March, 2024;
originally announced March 2024.
-
Empirical Analysis for Unsupervised Universal Dependency Parse Tree Aggregation
Authors:
Adithya Kulkarni,
Oliver Eulenstein,
Qi Li
Abstract:
Dependency parsing is an essential task in NLP, and the quality of dependency parsers is crucial for many downstream tasks. Parsers' quality often varies depending on the domain and the language involved. Therefore, it is essential to combat the issue of varying quality to achieve stable performance. In various NLP tasks, aggregation methods are used for post-processing aggregation and have been s…
▽ More
Dependency parsing is an essential task in NLP, and the quality of dependency parsers is crucial for many downstream tasks. Parsers' quality often varies depending on the domain and the language involved. Therefore, it is essential to combat the issue of varying quality to achieve stable performance. In various NLP tasks, aggregation methods are used for post-processing aggregation and have been shown to combat the issue of varying quality. However, aggregation methods for post-processing aggregation have not been sufficiently studied in dependency parsing tasks. In an extensive empirical study, we compare different unsupervised post-processing aggregation methods to identify the most suitable dependency tree structure aggregation method.
△ Less
Submitted 3 April, 2024; v1 submitted 28 March, 2024;
originally announced March 2024.
-
Observation of the semileptonic decays $D^0\rightarrow K_S^0π^-π^0 e^+ ν_e$ and $D^+\rightarrow K_S^0π^+π^- e^+ ν_e$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann
, et al. (600 additional authors not shown)
Abstract:
By analyzing $e^+e^-$ annihilation data corresponding to an integrated luminosity of 2.93 $\rm fb^{-1}$ collected at a center-of-mass energy of 3.773 GeV with the \text{BESIII} detector, the first observation of the semileptonic decays $D^0\rightarrow K_S^0π^-π^0 e^+ ν_e$ and $D^+\rightarrow K_S^0π^+π^- e^+ ν_e$ is reported. With a dominant hadronic contribution from $K_1(1270)$, the branching fra…
▽ More
By analyzing $e^+e^-$ annihilation data corresponding to an integrated luminosity of 2.93 $\rm fb^{-1}$ collected at a center-of-mass energy of 3.773 GeV with the \text{BESIII} detector, the first observation of the semileptonic decays $D^0\rightarrow K_S^0π^-π^0 e^+ ν_e$ and $D^+\rightarrow K_S^0π^+π^- e^+ ν_e$ is reported. With a dominant hadronic contribution from $K_1(1270)$, the branching fractions are measured to be $\mathcal{B}(D^0\rightarrow {K}_1(1270)^-(\to K^0_Sπ^-π^0)e^+ν_e)=(1.69^{+0.53}_{-0.46}\pm0.15)\times10^{-4}$ and $\mathcal{B}(D^+\to \bar{K}_1(1270)^0(\to K^0_Sπ^+π^-)e^+ν_e)=(1.47^{+0.45}_{-0.40}\pm0.20)\times10^{-4}$ with statistical significance of 5.4$σ$ and 5.6$σ$, respectively. When combined with measurements of the $K_1(1270)\to K^+π^-π$ decays, the absolute branching fractions are determined to be $\mathcal{B}(D^0\to K_1(1270)^-e^+ν_e)=(1.05^{+0.33}_{-0.28}\pm0.12\pm0.12)\times10^{-3}$ and $\mathcal{B}(D^+\to \bar{K}_1(1270)^0e^+ν_e)=(1.29^{+0.40}_{-0.35}\pm0.18\pm0.15)\times10^{-3}$. The first and second uncertainties are statistical and systematic, respectively, and the third uncertainties originate from the assumed branching fractions of the $K_1(1270)\to Kππ$ decays.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
Learning Inclusion Matching for Animation Paint Bucket Colorization
Authors:
Yuekun Dai,
Shangchen Zhou,
Qinyue Li,
Chongyi Li,
Chen Change Loy
Abstract:
Colorizing line art is a pivotal task in the production of hand-drawn cel animation. This typically involves digital painters using a paint bucket tool to manually color each segment enclosed by lines, based on RGB values predetermined by a color designer. This frame-by-frame process is both arduous and time-intensive. Current automated methods mainly focus on segment matching. This technique migr…
▽ More
Colorizing line art is a pivotal task in the production of hand-drawn cel animation. This typically involves digital painters using a paint bucket tool to manually color each segment enclosed by lines, based on RGB values predetermined by a color designer. This frame-by-frame process is both arduous and time-intensive. Current automated methods mainly focus on segment matching. This technique migrates colors from a reference to the target frame by aligning features within line-enclosed segments across frames. However, issues like occlusion and wrinkles in animations often disrupt these direct correspondences, leading to mismatches. In this work, we introduce a new learning-based inclusion matching pipeline, which directs the network to comprehend the inclusion relationships between segments rather than relying solely on direct visual correspondences. Our method features a two-stage pipeline that integrates a coarse color war** module with an inclusion matching module, enabling more nuanced and accurate colorization. To facilitate the training of our network, we also develope a unique dataset, referred to as PaintBucket-Character. This dataset includes rendered line arts alongside their colorized counterparts, featuring various 3D characters. Extensive experiments demonstrate the effectiveness and superiority of our method over existing techniques.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
Super-Resolution of SOHO/MDI Magnetograms of Solar Active Regions Using SDO/HMI Data and an Attention-Aided Convolutional Neural Network
Authors:
Chunhui Xu,
Jason T. L. Wang,
Haimin Wang,
Haodi Jiang,
Qin Li,
Yasser Abduallah,
Yan Xu
Abstract:
Image super-resolution has been an important subject in image processing and recognition. Here, we present an attention-aided convolutional neural network (CNN) for solar image super-resolution. Our method, named SolarCNN, aims to enhance the quality of line-of-sight (LOS) magnetograms of solar active regions (ARs) collected by the Michelson Doppler Imager (MDI) on board the Solar and Heliospheric…
▽ More
Image super-resolution has been an important subject in image processing and recognition. Here, we present an attention-aided convolutional neural network (CNN) for solar image super-resolution. Our method, named SolarCNN, aims to enhance the quality of line-of-sight (LOS) magnetograms of solar active regions (ARs) collected by the Michelson Doppler Imager (MDI) on board the Solar and Heliospheric Observatory (SOHO). The ground-truth labels used for training SolarCNN are the LOS magnetograms collected by the Helioseismic and Magnetic Imager (HMI) on board the Solar Dynamics Observatory (SDO). Solar ARs consist of strong magnetic fields in which magnetic energy can suddenly be released to produce extreme space weather events, such as solar flares, coronal mass ejections, and solar energetic particles. SOHO/MDI covers Solar Cycle 23, which is stronger with more eruptive events than Cycle 24. Enhanced SOHO/MDI magnetograms allow for better understanding and forecasting of violent events of space weather. Experimental results show that SolarCNN improves the quality of SOHO/MDI magnetograms in terms of the structural similarity index measure (SSIM), Pearson's correlation coefficient (PCC), and the peak signal-to-noise ratio (PSNR).
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
A family of Chatterjee's correlation coefficients and their properties
Authors:
Muhong Gao,
Qizhai Li
Abstract:
Quantifying the strength of functional dependence between random scalars $X$ and $Y$ is an important statistical problem. While many existing correlation coefficients excel in identifying linear or monotone functional dependence, they fall short in capturing general non-monotone functional relationships. In response, we propose a family of correlation coefficients $ξ^{(h,F)}_n$, characterized by a…
▽ More
Quantifying the strength of functional dependence between random scalars $X$ and $Y$ is an important statistical problem. While many existing correlation coefficients excel in identifying linear or monotone functional dependence, they fall short in capturing general non-monotone functional relationships. In response, we propose a family of correlation coefficients $ξ^{(h,F)}_n$, characterized by a continuous bivariate function $h$ and a cdf function $F$. By offering a range of selections for $h$ and $F$, $ξ^{(h,F)}_n$ encompasses a diverse class of novel correlation coefficients, while also incorporates the Chatterjee's correlation coefficient (Chatterjee, 2021) as a special case. We prove that $ξ^{(h,F)}_n$ converges almost surely to a deterministic limit $ξ^{(h,F)}$ as sample size $n$ approaches infinity. In addition, under appropriate conditions imposed on $h$ and $F$, the limit $ξ^{(h,F)}$ satisfies the three appealing properties: (P1). it belongs to the range of $[0,1]$; (P2). it equals 1 if and only if $Y$ is a measurable function of $X$; and (P3). it equals 0 if and only if $Y$ is independent of $X$. As amplified by our numerical experiments, our proposals provide practitioners with a variety of options to choose the most suitable correlation coefficient tailored to their specific practical needs.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
Physical 3D Adversarial Attacks against Monocular Depth Estimation in Autonomous Driving
Authors:
Junhao Zheng,
Chenhao Lin,
Jiahao Sun,
Zhengyu Zhao,
Qian Li,
Chao Shen
Abstract:
Deep learning-based monocular depth estimation (MDE), extensively applied in autonomous driving, is known to be vulnerable to adversarial attacks. Previous physical attacks against MDE models rely on 2D adversarial patches, so they only affect a small, localized region in the MDE map but fail under various viewpoints. To address these limitations, we propose 3D Depth Fool (3D$^2$Fool), the first 3…
▽ More
Deep learning-based monocular depth estimation (MDE), extensively applied in autonomous driving, is known to be vulnerable to adversarial attacks. Previous physical attacks against MDE models rely on 2D adversarial patches, so they only affect a small, localized region in the MDE map but fail under various viewpoints. To address these limitations, we propose 3D Depth Fool (3D$^2$Fool), the first 3D texture-based adversarial attack against MDE models. 3D$^2$Fool is specifically optimized to generate 3D adversarial textures agnostic to model types of vehicles and to have improved robustness in bad weather conditions, such as rain and fog. Experimental results validate the superior performance of our 3D$^2$Fool across various scenarios, including vehicles, MDE models, weather conditions, and viewpoints. Real-world experiments with printed 3D textures on physical vehicle models further demonstrate that our 3D$^2$Fool can cause an MDE error of over 10 meters.
△ Less
Submitted 27 March, 2024; v1 submitted 25 March, 2024;
originally announced March 2024.
-
Cross section measurement of $e^+e^-\to ηψ(2S)$ and search for $e^+e^-\toη\tilde{X}(3872)$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (639 additional authors not shown)
Abstract:
The energy-dependent cross section for $e^+e^-\to ηψ(2S)$ is measured at eighteen center of mass energies from 4.288 GeV to 4.951 GeV using the BESIII detector. Using the same data samples, we also perform the first search for the reaction $e^+e^-\toη\tilde{X}(3872)$, but no evidence is found for the $\tilde{X}(3872)$ in the $π^+π^- J/ψ$ mass distribution. At each of the eighteen center of mass en…
▽ More
The energy-dependent cross section for $e^+e^-\to ηψ(2S)$ is measured at eighteen center of mass energies from 4.288 GeV to 4.951 GeV using the BESIII detector. Using the same data samples, we also perform the first search for the reaction $e^+e^-\toη\tilde{X}(3872)$, but no evidence is found for the $\tilde{X}(3872)$ in the $π^+π^- J/ψ$ mass distribution. At each of the eighteen center of mass energies, upper limits at the 90\% confidence level on the cross section for $e^+e^-\toηψ(2S)$ and on the product of the $e^+e^-\toη\tilde{X}(3872)$ cross section with the branching fraction of $\tilde{X}(3872)\toπ^+π^- J/ψ$ are reported.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
A Distributionally Robust Model Predictive Control for Static and Dynamic Uncertainties in Smart Grids
Authors:
Qi Li,
Ye Shi,
Yuning Jiang,
Yuanming Shi,
Haoyu Wang,
H. Vincent Poor
Abstract:
The integration of various power sources, including renewables and electric vehicles, into smart grids is expanding, introducing uncertainties that can result in issues like voltage imbalances, load fluctuations, and power losses. These challenges negatively impact the reliability and stability of online scheduling in smart grids. Existing research often addresses uncertainties affecting current s…
▽ More
The integration of various power sources, including renewables and electric vehicles, into smart grids is expanding, introducing uncertainties that can result in issues like voltage imbalances, load fluctuations, and power losses. These challenges negatively impact the reliability and stability of online scheduling in smart grids. Existing research often addresses uncertainties affecting current states but overlooks those that impact future states, such as the unpredictable charging patterns of electric vehicles. To distinguish between these, we term them static uncertainties and dynamic uncertainties, respectively. This paper introduces WDR-MPC, a novel approach that stands for two-stage Wasserstein-based Distributionally Robust (WDR) optimization within a Model Predictive Control (MPC) framework, aimed at effectively managing both types of uncertainties in smart grids. The dynamic uncertainties are first reformulated into ambiguity tubes and then the distributionally robust bounds of both dynamic and static uncertainties can be established using WDR optimization. By employing ambiguity tubes and WDR optimization, the stochastic MPC system is converted into a nominal one. Moreover, we develop a convex reformulation method to speed up WDR computation during the two-stage optimization. The distinctive contribution of this paper lies in its holistic approach to both static and dynamic uncertainties in smart grids. Comprehensive experiment results on IEEE 38-bus and 94-bus systems reveal the method's superior performance and the potential to enhance grid stability and reliability.
△ Less
Submitted 24 March, 2024;
originally announced March 2024.
-
ChebMixer: Efficient Graph Representation Learning with MLP Mixer
Authors:
Xiaoyan Kui,
Haonan Yan,
Qinsong Li,
Liming Chen,
Beiji Zou
Abstract:
Graph neural networks have achieved remarkable success in learning graph representations, especially graph Transformer, which has recently shown superior performance on various graph mining tasks. However, graph Transformer generally treats nodes as tokens, which results in quadratic complexity regarding the number of nodes during self-attention computation. The graph MLP Mixer addresses this chal…
▽ More
Graph neural networks have achieved remarkable success in learning graph representations, especially graph Transformer, which has recently shown superior performance on various graph mining tasks. However, graph Transformer generally treats nodes as tokens, which results in quadratic complexity regarding the number of nodes during self-attention computation. The graph MLP Mixer addresses this challenge by using the efficient MLP Mixer technique from computer vision. However, the time-consuming process of extracting graph tokens limits its performance. In this paper, we present a novel architecture named ChebMixer, a newly graph MLP Mixer that uses fast Chebyshev polynomials-based spectral filtering to extract a sequence of tokens. Firstly, we produce multiscale representations of graph nodes via fast Chebyshev polynomial-based spectral filtering. Next, we consider each node's multiscale representations as a sequence of tokens and refine the node representation with an effective MLP Mixer. Finally, we aggregate the multiscale representations of nodes through Chebyshev interpolation. Owing to the powerful representation capabilities and fast computational properties of MLP Mixer, we can quickly extract more informative node representations to improve the performance of downstream tasks. The experimental results prove our significant improvements in a variety of scenarios ranging from graph node classification to medical image segmentation.
△ Less
Submitted 4 June, 2024; v1 submitted 24 March, 2024;
originally announced March 2024.
-
A Coupled Optimization Framework for Correlated Equilibria in Normal-Form Game
Authors:
Sarah H. Q. Li,
Yue Yu,
Florian Dörfler,
John Lygeros
Abstract:
In competitive multi-player interactions, simultaneous optimality is a key requirement for establishing strategic equilibria. This property is explicit when the game-theoretic equilibrium is the simultaneously optimal solution of coupled optimization problems. However, no such optimization problems exist for the correlated equilibrium, a strategic equilibrium where the players can correlate their…
▽ More
In competitive multi-player interactions, simultaneous optimality is a key requirement for establishing strategic equilibria. This property is explicit when the game-theoretic equilibrium is the simultaneously optimal solution of coupled optimization problems. However, no such optimization problems exist for the correlated equilibrium, a strategic equilibrium where the players can correlate their actions. We address the lack of a coupled optimization framework for the correlated equilibrium by introducing an {unnormalized game} -- an extension of normal-form games in which the player strategies are lifted to unnormalized measures over the joint actions. We show that the set of fully mixed generalized Nash equilibria of this unnormalized game is a subset of the correlated equilibrium of the normal-form game. Furthermore, we introduce an entropy regularization to the unnormalized game and prove that the entropy-regularized generalized Nash equilibrium is a sub-optimal correlated equilibrium of the normal form game where the degree of sub-optimality depends on the magnitude of regularization. We prove that the entropy-regularized unnormalized game has a closed-form solution, and empirically verify its computational efficacy at approximating the correlated equilibrium of normal-form games.
△ Less
Submitted 3 April, 2024; v1 submitted 24 March, 2024;
originally announced March 2024.
-
Innovative Quantitative Analysis for Disease Progression Assessment in Familial Cerebral Cavernous Malformations
Authors:
Ruige Zong,
Tao Wang,
Chunwang Li,
Xinlin Zhang,
Yuanbin Chen,
Longxuan Zhao,
Qixuan Li,
Qinquan Gao,
Dezhi Kang,
Fuxin Lin,
Tong Tong
Abstract:
Familial cerebral cavernous malformation (FCCM) is a hereditary disorder characterized by abnormal vascular structures within the central nervous system. The FCCM lesions are often numerous and intricate, making quantitative analysis of the lesions a labor-intensive task. Consequently, clinicians face challenges in quantitatively assessing the severity of lesions and determining whether lesions ha…
▽ More
Familial cerebral cavernous malformation (FCCM) is a hereditary disorder characterized by abnormal vascular structures within the central nervous system. The FCCM lesions are often numerous and intricate, making quantitative analysis of the lesions a labor-intensive task. Consequently, clinicians face challenges in quantitatively assessing the severity of lesions and determining whether lesions have progressed. To alleviate this problem, we propose a quantitative statistical framework for FCCM, comprising an efficient annotation module, an FCCM lesion segmentation module, and an FCCM lesion quantitative statistics module. Our framework demonstrates precise segmentation of the FCCM lesion based on efficient data annotation, achieving a Dice coefficient of 93.22\%. More importantly, we focus on quantitative statistics of lesions, which is combined with image registration to realize the quantitative comparison of lesions between different examinations of patients, and a visualization framework has been established for doctors to comprehensively compare and analyze lesions. The experimental results have demonstrated that our proposed framework not only obtains objective, accurate, and comprehensive quantitative statistical information, which provides a quantitative assessment method for disease progression and drug efficacy study, but also considerably reduces the manual measurement and statistical workload of lesions, assisting clinical decision-making for FCCM and accelerating progress in FCCM clinical research. This highlights the potential of practical application of the framework in FCCM clinical research and clinical decision-making. The codes are available at https://github.com/6zrg/Quantitative-Statistics-of-FCCM.
△ Less
Submitted 23 March, 2024;
originally announced March 2024.
-
Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting
Authors:
Jun Guo,
Xiaojian Ma,
Yue Fan,
Hua** Liu,
Qing Li
Abstract:
Open-vocabulary 3D scene understanding presents a significant challenge in computer vision, withwide-ranging applications in embodied agents and augmented reality systems. Previous approaches haveadopted Neural Radiance Fields (NeRFs) to analyze 3D scenes. In this paper, we introduce SemanticGaussians, a novel open-vocabulary scene understanding approach based on 3D Gaussian Splatting. Our keyidea…
▽ More
Open-vocabulary 3D scene understanding presents a significant challenge in computer vision, withwide-ranging applications in embodied agents and augmented reality systems. Previous approaches haveadopted Neural Radiance Fields (NeRFs) to analyze 3D scenes. In this paper, we introduce SemanticGaussians, a novel open-vocabulary scene understanding approach based on 3D Gaussian Splatting. Our keyidea is distilling pre-trained 2D semantics into 3D Gaussians. We design a versatile projection approachthat maps various 2Dsemantic features from pre-trained image encoders into a novel semantic component of 3D Gaussians, withoutthe additional training required by NeRFs. We further build a 3D semantic network that directly predictsthe semantic component from raw 3D Gaussians for fast inference. We explore several applications ofSemantic Gaussians: semantic segmentation on ScanNet-20, where our approach attains a 4.2% mIoU and 4.0%mAcc improvement over prior open-vocabulary scene understanding counterparts; object part segmentation,sceneediting, and spatial-temporal segmentation with better qualitative results over 2D and 3D baselines,highlighting its versatility and effectiveness on supporting diverse downstream tasks.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
From Hardware Fingerprint to Access Token: Enhancing the Authentication on IoT Devices
Authors:
Yue Xiao,
Yi He,
Xiaoli Zhang,
Qian Wang,
Renjie Xie,
Kun Sun,
Ke Xu,
Qi Li
Abstract:
The proliferation of consumer IoT products in our daily lives has raised the need for secure device authentication and access control. Unfortunately, these resource-constrained devices typically use token-based authentication, which is vulnerable to token compromise attacks that allow attackers to impersonate the devices and perform malicious operations by stealing the access token. Using hardware…
▽ More
The proliferation of consumer IoT products in our daily lives has raised the need for secure device authentication and access control. Unfortunately, these resource-constrained devices typically use token-based authentication, which is vulnerable to token compromise attacks that allow attackers to impersonate the devices and perform malicious operations by stealing the access token. Using hardware fingerprints to secure their authentication is a promising way to mitigate these threats. However, once attackers have stolen some hardware fingerprints (e.g., via MitM attacks), they can bypass the hardware authentication by training a machine learning model to mimic fingerprints or reusing these fingerprints to craft forge requests.
In this paper, we present MCU-Token, a secure hardware fingerprinting framework for MCU-based IoT devices even if the cryptographic mechanisms (e.g., private keys) are compromised. MCU-Token can be easily integrated with various IoT devices by simply adding a short hardware fingerprint-based token to the existing payload. To prevent the reuse of this token, we propose a message map** approach that binds the token to a specific request via generating the hardware fingerprints based on the request payload. To defeat the machine learning attacks, we mix the valid fingerprints with poisoning data so that attackers cannot train a usable model with the leaked tokens. MCU-Token can defend against armored adversary who may replay, craft, and offload the requests via MitM or use both hardware (e.g., use identical devices) and software (e.g., machine learning attacks) strategies to mimic the fingerprints. The system evaluation shows that MCU-Token can achieve high accuracy (over 97%) with a low overhead across various IoT devices and application scenarios.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
Precise measurement of the $e^+e^-\to D_s^+D_s^-$ cross sections at center-of-mass energies from threshold to 4.95 GeV
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (638 additional authors not shown)
Abstract:
Using the $e^+e^-$ collision data collected with the BESIII detector operating at the BEPCII collider, at center-of-mass energies from the threshold to $4.95$~GeV, we present precise measurements of the cross sections for the process $e^+e^-\to D_s^+D_s^-$ using a single tag method. The resulting cross section lineshape exhibits several new structures, thereby offering an input for coupled channel…
▽ More
Using the $e^+e^-$ collision data collected with the BESIII detector operating at the BEPCII collider, at center-of-mass energies from the threshold to $4.95$~GeV, we present precise measurements of the cross sections for the process $e^+e^-\to D_s^+D_s^-$ using a single tag method. The resulting cross section lineshape exhibits several new structures, thereby offering an input for coupled channel analysis and model tests, which are critical to understand vector charmonium-like states with masses between 4 and 5~GeV.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
Search for R-Parity-Violation-Induced Charged Lepton Flavor Violation at Future Lepton Colliders
Authors:
Xunye Cai,
**gshu Li,
Ran Ding,
Meng Lu,
Zhengyun You,
Qiang Li
Abstract:
Interest in searches for Charged Lepton Flavor Violation (CLFV) has continued in the past few decades since the observation of CLFV will indicate new physics beyond the Standard Model (BSM). As several future lepton colliders with high luminosity have been proposed, the search for CLFV will reach an unprecedented level of precision. Many BSM models allow CLFV processes at the tree level, such as t…
▽ More
Interest in searches for Charged Lepton Flavor Violation (CLFV) has continued in the past few decades since the observation of CLFV will indicate new physics beyond the Standard Model (BSM). As several future lepton colliders with high luminosity have been proposed, the search for CLFV will reach an unprecedented level of precision. Many BSM models allow CLFV processes at the tree level, such as the R-parity-violating (RPV) Minimal Supersymmetric Standard Model (MSSM), which is a good choice for benchmark. In this paper, we perform a detailed fast Monte Carlo simulation study on RPV-induced CLFV processes at future lepton colliders, including a 240 GeV circular electron positron collider (CEPC) and a 6 or 14 TeV muon collider. As a result, we found that the upper limits on the $τ$ related RPV couplings will be significantly improved, while several new limits on RPV couplings can be set, which are inaccessible by low-energy experiments.
△ Less
Submitted 2 June, 2024; v1 submitted 22 March, 2024;
originally announced March 2024.
-
Generative Active Learning for Image Synthesis Personalization
Authors:
Xulu Zhang,
Wengyu Zhang,
Xiao-Yong Wei,
**lin Wu,
Zhaoxiang Zhang,
Zhen Lei,
Qing Li
Abstract:
This paper presents a pilot study that explores the application of active learning, traditionally studied in the context of discriminative models, to generative models. We specifically focus on image synthesis personalization tasks. The primary challenge in conducting active learning on generative models lies in the open-ended nature of querying, which differs from the closed form of querying in d…
▽ More
This paper presents a pilot study that explores the application of active learning, traditionally studied in the context of discriminative models, to generative models. We specifically focus on image synthesis personalization tasks. The primary challenge in conducting active learning on generative models lies in the open-ended nature of querying, which differs from the closed form of querying in discriminative models that typically target a single concept. We introduce the concept of anchor directions to transform the querying process into a semi-open problem. We propose a direction-based uncertainty sampling strategy to enable generative active learning and tackle the exploitation-exploration dilemma. Extensive experiments are conducted to validate the effectiveness of our approach, demonstrating that an open-source model can achieve superior performance compared to closed-source models developed by large companies, such as Google's StyleDrop. The source code is available at https://github.com/zhangxulu1996/GAL4Personalization.
△ Less
Submitted 16 April, 2024; v1 submitted 22 March, 2024;
originally announced March 2024.
-
A Picture Is Worth a Graph: Blueprint Debate on Graph for Multimodal Reasoning
Authors:
Changmeng Zheng,
Dayong Liang,
Wengyu Zhang,
Xiao-Yong Wei,
Tat-Seng Chua,
Qing Li
Abstract:
This paper presents a pilot study aimed at introducing multi-agent debate into multimodal reasoning. The study addresses two key challenges: the trivialization of opinions resulting from excessive summarization and the diversion of focus caused by distractor concepts introduced from images. These challenges stem from the inductive (bottom-up) nature of existing debating schemes. To address the iss…
▽ More
This paper presents a pilot study aimed at introducing multi-agent debate into multimodal reasoning. The study addresses two key challenges: the trivialization of opinions resulting from excessive summarization and the diversion of focus caused by distractor concepts introduced from images. These challenges stem from the inductive (bottom-up) nature of existing debating schemes. To address the issue, we propose a deductive (top-down) debating approach called Blueprint Debate on Graphs (BDoG). In BDoG, debates are confined to a blueprint graph to prevent opinion trivialization through world-level summarization. Moreover, by storing evidence in branches within the graph, BDoG mitigates distractions caused by frequent but irrelevant concepts. Extensive experiments validate BDoG, achieving state-of-the-art results in Science QA and MMBench with significant improvements over previous methods.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
DreamReward: Text-to-3D Generation with Human Preference
Authors:
Junliang Ye,
Fangfu Liu,
Qixiu Li,
Zhengyi Wang,
Yikai Wang,
Xinzhou Wang,
Yueqi Duan,
Jun Zhu
Abstract:
3D content creation from text prompts has shown remarkable success recently. However, current text-to-3D methods often generate 3D results that do not align well with human preferences. In this paper, we present a comprehensive framework, coined DreamReward, to learn and improve text-to-3D models from human preference feedback. To begin with, we collect 25k expert comparisons based on a systematic…
▽ More
3D content creation from text prompts has shown remarkable success recently. However, current text-to-3D methods often generate 3D results that do not align well with human preferences. In this paper, we present a comprehensive framework, coined DreamReward, to learn and improve text-to-3D models from human preference feedback. To begin with, we collect 25k expert comparisons based on a systematic annotation pipeline including rating and ranking. Then, we build Reward3D -- the first general-purpose text-to-3D human preference reward model to effectively encode human preferences. Building upon the 3D reward model, we finally perform theoretical analysis and present the Reward3D Feedback Learning (DreamFL), a direct tuning algorithm to optimize the multi-view diffusion models with a redefined scorer. Grounded by theoretical proof and extensive experiment comparisons, our DreamReward successfully generates high-fidelity and 3D consistent results with significant boosts in prompt alignment with human intention. Our results demonstrate the great potential for learning from human feedback to improve text-to-3D models.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
Evolution of flat band and role of lattice relaxations in twisted bilayer graphene
Authors:
Qian Li,
Hongyun Zhang,
Yijie Wang,
Wanying Chen,
Changhua Bao,
Qinxin Liu,
Tianyun Lin,
Shuai Zhang,
Haoxiong Zhang,
Kenji Watanabe,
Takashi Taniguchi,
Jose Avila,
Pavel Dudin,
Qunyang Li,
Pu Yu,
Wenhui Duan,
Zhida Song,
Shuyun Zhou
Abstract:
Magic-angle twisted bilayer graphene (MATBG) exhibits correlated phenomena such as superconductivity and Mott insulating state related to the weakly dispersing flat band near the Fermi energy. Beyond its moiré period, such flat band is expected to be sensitive to lattice relaxations. Thus, clarifying the evolution of the electronic structure with twist angle is critical for understanding the physi…
▽ More
Magic-angle twisted bilayer graphene (MATBG) exhibits correlated phenomena such as superconductivity and Mott insulating state related to the weakly dispersing flat band near the Fermi energy. Beyond its moiré period, such flat band is expected to be sensitive to lattice relaxations. Thus, clarifying the evolution of the electronic structure with twist angle is critical for understanding the physics of MATBG. Here, we combine nanospot angle-resolved photoemission spectroscopy and atomic force microscopy to resolve the fine electronic structure of the flat band and remote bands, and their evolution with twist angles from 1.07$^\circ$ to 2.60$^\circ$. Near the magic angle, dispersion is characterized by a flat band near the Fermi energy with a strongly reduced bandwidth. Moreover, near 1.07$^\circ$, we observe a spectral weight transfer between remote bands at higher binding energy and extract the modulated interlayer spacing near the magic angle. Our work provides direct spectroscopic information on flat band physics and highlights the role of lattice relaxations.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Large Exciton Binding Energy in the Bulk van der Waals Magnet CrSBr
Authors:
Shane Smolenski,
Ming Wen,
Qiuyang Li,
Eoghan Downey,
Adam Alfrey,
Wenhao Liu,
Aswin L. N. Kondusamy,
Aaron Bostwick,
Chris Jozwiak,
Eli Rotenberg,
Liuyan Zhao,
Hui Deng,
Bing Lv,
Dominika Zgid,
Emanuel Gull,
Na Hyun Jo
Abstract:
Excitons, bound electron-hole pairs, influence the optical properties in strongly interacting solid state systems. Excitons and their associated many-body physics are typically most stable and pronounced in monolayer materials. Bulk systems with large exciton binding energies, on the other hand, are rare and the mechanisms driving their stability are still relatively unexplored. Here, we report an…
▽ More
Excitons, bound electron-hole pairs, influence the optical properties in strongly interacting solid state systems. Excitons and their associated many-body physics are typically most stable and pronounced in monolayer materials. Bulk systems with large exciton binding energies, on the other hand, are rare and the mechanisms driving their stability are still relatively unexplored. Here, we report an exceptionally large exciton binding energy in single crystals of the bulk van der Waals antiferromagnet CrSBr. Utilizing state-of-the-art angle-resolved photoemission spectroscopy and self-consistent ab-initio GW calculations, we present direct spectroscopic evidence that robust electronic and structural anisotropy can significantly amplify the exciton binding energy within bulk crystals. Furthermore, the application of a vertical electric field enables broad tunability of the optical and electronic properties. Our results indicate that CrSBr is a promising material for the study of the role of anisotropy in strongly interacting bulk systems and for the development of exciton-based optoelectronics.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Search for $ΔS=2$ nonleptonic hyperon decays $Ω^-\toΣ^{0}π^{-}$ and $Ω^-\to nK^{-}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (638 additional authors not shown)
Abstract:
Using $(27.12 \pm 0.14) \times 10^{8}$ $ψ(3686)$ events collected by the BESIII detector at the center-of-mass energy of $\sqrt{s} = 3.686$ GeV, we search for the first time for two nonleptonic hyperon decays that change strangeness by two units, $Ω^-\toΣ^{0}π^-$ and $Ω^-\to nK^{-}$. No significant signal is observed. The upper limits on their decay branching fractions are determined to be…
▽ More
Using $(27.12 \pm 0.14) \times 10^{8}$ $ψ(3686)$ events collected by the BESIII detector at the center-of-mass energy of $\sqrt{s} = 3.686$ GeV, we search for the first time for two nonleptonic hyperon decays that change strangeness by two units, $Ω^-\toΣ^{0}π^-$ and $Ω^-\to nK^{-}$. No significant signal is observed. The upper limits on their decay branching fractions are determined to be $\mathcal{B}(Ω^-\toΣ^{0}π^-) < 5.4\times 10^{-4}$ and $\mathcal{B}(Ω^-\to nK^{-}) < 2.4\times 10^{-4}$ at the $90\%$ confidence level.
△ Less
Submitted 14 April, 2024; v1 submitted 20 March, 2024;
originally announced March 2024.
-
AdaFish: Fast low-rank parameter-efficient fine-tuning by using second-order information
Authors:
Jiang Hu,
Quanzheng Li
Abstract:
Recent advancements in large-scale pretrained models have significantly improved performance across a variety of tasks in natural language processing and computer vision. However, the extensive number of parameters in these models necessitates substantial memory and computational resources for full training. To adapt these models for downstream tasks or specific application-oriented datasets, para…
▽ More
Recent advancements in large-scale pretrained models have significantly improved performance across a variety of tasks in natural language processing and computer vision. However, the extensive number of parameters in these models necessitates substantial memory and computational resources for full training. To adapt these models for downstream tasks or specific application-oriented datasets, parameter-efficient fine-tuning methods leveraging pretrained parameters have gained considerable attention. However, it can still be time-consuming due to lots of parameters and epochs. In this work, we introduce AdaFish, an efficient algorithm of the second-order type designed to expedite the training process within low-rank decomposition-based fine-tuning frameworks. Our key observation is that the associated generalized Fisher information matrix is either low-rank or extremely small-scaled. Such a generalized Fisher information matrix is shown to be equivalent to the Hessian matrix. Moreover, we prove the global convergence of AdaFish, along with its iteration/oracle complexity. Numerical experiments show that our algorithm is quite competitive with the state-of-the-art AdamW method.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
Towards Lifecycle Unlearning Commitment Management: Measuring Sample-level Approximate Unlearning Completeness
Authors:
Cheng-Long Wang,
Qi Li,
Zihang Xiang,
Yinzhi Cao,
Di Wang
Abstract:
By adopting a more flexible definition of unlearning and adjusting the model distribution to simulate training without the targeted data, approximate machine unlearning provides a less resource-demanding alternative to the more laborious exact unlearning methods. Yet, the unlearning completeness of target samples-even when the approximate algorithms are executed faithfully without external threats…
▽ More
By adopting a more flexible definition of unlearning and adjusting the model distribution to simulate training without the targeted data, approximate machine unlearning provides a less resource-demanding alternative to the more laborious exact unlearning methods. Yet, the unlearning completeness of target samples-even when the approximate algorithms are executed faithfully without external threats-remains largely unexamined, raising questions about those approximate algorithms' ability to fulfill their commitment of unlearning during the lifecycle.
In this paper, we introduce the task of Lifecycle Unlearning Commitment Management (LUCM) for approximate unlearning and outline its primary challenges. We propose an efficient metric designed to assess the sample-level unlearning completeness. Our empirical results demonstrate its superiority over membership inference techniques in two key areas: the strong correlation of its measurements with unlearning completeness across various unlearning tasks, and its computational efficiency, making it suitable for real-time applications. Additionally, we show that this metric is able to serve as a tool for monitoring unlearning anomalies throughout the unlearning lifecycle, including both under-unlearning and over-unlearning.
We apply this metric to evaluate the unlearning commitments of current approximate algorithms. Our analysis, conducted across multiple unlearning benchmarks, reveals that these algorithms inconsistently fulfill their unlearning commitments due to two main issues: 1) unlearning new data can significantly affect the unlearning utility of previously requested data, and 2) approximate algorithms fail to ensure equitable unlearning utility across different groups. These insights emphasize the crucial importance of LUCM throughout the unlearning lifecycle. We will soon open-source our newly developed benchmark.
△ Less
Submitted 30 April, 2024; v1 submitted 19 March, 2024;
originally announced March 2024.
-
Predicting the stability of profiling signals of small RNAs
Authors:
Qiuyun Li,
Manda Riehl
Abstract:
Profiling is a process that finds similarities between different RNA secondary structures by extracting signals from the Boltzmann sampling. The reproducibility of profiling can be identified by the standard deviation of number of features among Boltzmann samples. We found a strong relationship between the frequency of each helix class and its standard deviation of the frequency upon repeated Bolt…
▽ More
Profiling is a process that finds similarities between different RNA secondary structures by extracting signals from the Boltzmann sampling. The reproducibility of profiling can be identified by the standard deviation of number of features among Boltzmann samples. We found a strong relationship between the frequency of each helix class and its standard deviation of the frequency upon repeated Boltzmann sampling. We developed a perturbation technique to predict the stability of these featured helix classes without the need for repeated Boltzmann sampling, with accuracy between 84% and 94%, depending on the type of RNA. Our technique only requires 0.2% of the computation time compared to one profiling process.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
Context-based Fast Recommendation Strategy for Long User Behavior Sequence in Meituan Waimai
Authors:
Zhichao Feng,
Junjiie Xie,
Kaiyuan Li,
Yu Qin,
Pengfei Wang,
Qianzhong Li,
Bin Yin,
Xiang Li,
Wei Lin,
Shangguang Wang
Abstract:
In the recommender system of Meituan Waimai, we are dealing with ever-lengthening user behavior sequences, which pose an increasing challenge to modeling user preference effectively. Existing sequential recommendation models often fail to capture long-term dependencies or are too complex, complicating the fulfillment of Meituan Waimai's unique business needs. To better model user interests, we con…
▽ More
In the recommender system of Meituan Waimai, we are dealing with ever-lengthening user behavior sequences, which pose an increasing challenge to modeling user preference effectively. Existing sequential recommendation models often fail to capture long-term dependencies or are too complex, complicating the fulfillment of Meituan Waimai's unique business needs. To better model user interests, we consider selecting relevant sub-sequences from users' extensive historical behaviors based on their preferences. In this specific scenario, we've noticed that the contexts in which users interact have a significant impact on their preferences. For this purpose, we introduce a novel method called Context-based Fast Recommendation Strategy to tackle the issue of long sequences. We first identify contexts that share similar user preferences with the target context and then locate the corresponding PoIs based on these identified contexts. This approach eliminates the necessity to select a sub-sequence for every candidate PoI, thereby avoiding high time complexity. Specifically, we implement a prototype-based approach to pinpoint contexts that mirror similar user preferences. To amplify accuracy and interpretability, we employ JS divergence of PoI attributes such as categories and prices as a measure of similarity between contexts. A temporal graph integrating both prototype and context nodes helps incorporate temporal information. We then identify appropriate prototypes considering both target contexts and short-term user preferences. Following this, we utilize contexts aligned with these prototypes to generate a sub-sequence, aimed at predicting CTR and CTCVR scores with target attention. Since its inception in 2023, this strategy has been adopted in Meituan Waimai's display recommender system, leading to a 4.6% surge in CTR and a 4.2% boost in GMV.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
End-to-End Neuro-Symbolic Reinforcement Learning with Textual Explanations
Authors:
Lirui Luo,
Guoxi Zhang,
Hongming Xu,
Yaodong Yang,
Cong Fang,
Qing Li
Abstract:
Neuro-symbolic reinforcement learning (NS-RL) has emerged as a promising paradigm for explainable decision-making, characterized by the interpretability of symbolic policies. NS-RL entails structured state representations for tasks with visual observations, but previous methods cannot refine the structured states with rewards due to a lack of efficiency. Accessibility also remains an issue, as ext…
▽ More
Neuro-symbolic reinforcement learning (NS-RL) has emerged as a promising paradigm for explainable decision-making, characterized by the interpretability of symbolic policies. NS-RL entails structured state representations for tasks with visual observations, but previous methods cannot refine the structured states with rewards due to a lack of efficiency. Accessibility also remains an issue, as extensive domain knowledge is required to interpret symbolic policies. In this paper, we present a neuro-symbolic framework for jointly learning structured states and symbolic policies, whose key idea is to distill the vision foundation model into an efficient perception module and refine it during policy learning. Moreover, we design a pipeline to prompt GPT-4 to generate textual explanations for the learned policies and decisions, significantly reducing users' cognitive load to understand the symbolic policies. We verify the efficacy of our approach on nine Atari tasks and present GPT-generated explanations for policies and decisions.
△ Less
Submitted 13 June, 2024; v1 submitted 19 March, 2024;
originally announced March 2024.
-
VideoBadminton: A Video Dataset for Badminton Action Recognition
Authors:
Qi Li,
Tzu-Chen Chiu,
Hsiang-Wei Huang,
Min-Te Sun,
Wei-Shinn Ku
Abstract:
In the dynamic and evolving field of computer vision, action recognition has become a key focus, especially with the advent of sophisticated methodologies like Convolutional Neural Networks (CNNs), Convolutional 3D, Transformer, and spatial-temporal feature fusion. These technologies have shown promising results on well-established benchmarks but face unique challenges in real-world applications,…
▽ More
In the dynamic and evolving field of computer vision, action recognition has become a key focus, especially with the advent of sophisticated methodologies like Convolutional Neural Networks (CNNs), Convolutional 3D, Transformer, and spatial-temporal feature fusion. These technologies have shown promising results on well-established benchmarks but face unique challenges in real-world applications, particularly in sports analysis, where the precise decomposition of activities and the distinction of subtly different actions are crucial. Existing datasets like UCF101, HMDB51, and Kinetics have offered a diverse range of video data for various scenarios. However, there's an increasing need for fine-grained video datasets that capture detailed categorizations and nuances within broader action categories. In this paper, we introduce the VideoBadminton dataset derived from high-quality badminton footage. Through an exhaustive evaluation of leading methodologies on this dataset, this study aims to advance the field of action recognition, particularly in badminton sports. The introduction of VideoBadminton could not only serve for badminton action recognition but also provide a dataset for recognizing fine-grained actions. The insights gained from these evaluations are expected to catalyze further research in action comprehension, especially within sports contexts.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
RankPrompt: Step-by-Step Comparisons Make Language Models Better Reasoners
Authors:
Chi Hu,
Yuan Ge,
Xiangnan Ma,
Hang Cao,
Qiang Li,
Yonghua Yang,
Tong Xiao,
**gbo Zhu
Abstract:
Large Language Models (LLMs) have achieved impressive performance across various reasoning tasks. However, even state-of-the-art LLMs such as ChatGPT are prone to logical errors during their reasoning processes. Existing solutions, such as deploying task-specific verifiers or voting over multiple reasoning paths, either require extensive human annotations or fail in scenarios with inconsistent res…
▽ More
Large Language Models (LLMs) have achieved impressive performance across various reasoning tasks. However, even state-of-the-art LLMs such as ChatGPT are prone to logical errors during their reasoning processes. Existing solutions, such as deploying task-specific verifiers or voting over multiple reasoning paths, either require extensive human annotations or fail in scenarios with inconsistent responses. To address these challenges, we introduce RankPrompt, a new prompting method that enables LLMs to self-rank their responses without additional resources. RankPrompt breaks down the ranking problem into a series of comparisons among diverse responses, leveraging the inherent capabilities of LLMs to generate chains of comparison as contextual exemplars. Our experiments across 11 arithmetic and commonsense reasoning tasks show that RankPrompt significantly enhances the reasoning performance of ChatGPT and GPT-4, with improvements of up to 13%. Moreover, RankPrompt excels in LLM-based automatic evaluations for open-ended tasks, aligning with human judgments 74% of the time in the AlpacaEval dataset. It also exhibits robustness to variations in response order and consistency. Collectively, our results validate RankPrompt as an effective method for eliciting high-quality feedback from language models.
△ Less
Submitted 22 March, 2024; v1 submitted 18 March, 2024;
originally announced March 2024.
-
Advancing COVID-19 Detection in 3D CT Scans
Authors:
Qingqiu Li,
Runtian Yuan,
Junlin Hou,
Jilan Xu,
Yuejie Zhang,
Rui Feng,
Hao Chen
Abstract:
To make a more accurate diagnosis of COVID-19, we propose a straightforward yet effective model. Firstly, we analyse the characteristics of 3D CT scans and remove the non-lung parts, facilitating the model to focus on lesion-related areas and reducing computational cost. We use ResNeSt50 as the strong feature extractor, initializing it with pretrained weights which have COVID-19-specific prior kno…
▽ More
To make a more accurate diagnosis of COVID-19, we propose a straightforward yet effective model. Firstly, we analyse the characteristics of 3D CT scans and remove the non-lung parts, facilitating the model to focus on lesion-related areas and reducing computational cost. We use ResNeSt50 as the strong feature extractor, initializing it with pretrained weights which have COVID-19-specific prior knowledge. Our model achieves a Macro F1 Score of 0.94 on the validation set of the 4th COV19D Competition Challenge $\mathrm{I}$, surpassing the baseline by 16%. This indicates its effectiveness in distinguishing between COVID-19 and non-COVID-19 cases, making it a robust method for COVID-19 detection.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Ultralarge polarization in ferroelectric hafnia-based thin films
Authors:
Han Wu,
Kun Lin,
Qinghua Zhang,
Qian Yu,
Xiaoqian Fu,
Qiang Li,
Meera Cheviri,
Oswaldo Dieguez,
Shuai Xu,
Lin Gu,
Yili Cao,
Jiaou Wang,
Zhen Wang,
Yu Chen,
Huanhua Wang,
**xia Deng,
Jun Miao,
Xianran Xing
Abstract:
Hafnia-based ferroelectrics have become a valuable class of electronic functional materials at the nanoscale, showing great potential for next-generation memory and logic devices. However, more robust ferroelectric properties and better understanding of the polarization mechanisms are currently needed both in technology and science. Herein, we report the properties of oxygen-deficient Hf0.5Zr0.5O2…
▽ More
Hafnia-based ferroelectrics have become a valuable class of electronic functional materials at the nanoscale, showing great potential for next-generation memory and logic devices. However, more robust ferroelectric properties and better understanding of the polarization mechanisms are currently needed both in technology and science. Herein, we report the properties of oxygen-deficient Hf0.5Zr0.5O2 films with ultralarge remanent polarization (Pr) of 387 uC cm-2 at room temperature (1 kHz). Structure characterizations identify a new ferroelectric monoclinic Pc phase in these Hf0.5Zr0.5O2 films. The in-situ STEM measurements evidence polar displacements of the oxygen atoms, which move up and down in the Pc structure under applied DC bias fields, showing a huge displacement (1.6 A). DFT calculations optimized the Pc structure and also predicted a large polarization. The coexistence of the ferroelectric monoclinic (Pc) phases and orthorhombic (Pca21) is responsible for this superior ferroelectric properties. These findings are promising for hafnia-based ferroelectric applications in integrated ferroelectric devices, energy harvesting and actuators, etc.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Domain Adaptation Using Pseudo Labels for COVID-19 Detection
Authors:
Runtian Yuan,
Qingqiu Li,
Junlin Hou,
Jilan Xu,
Yuejie Zhang,
Rui Feng,
Hao Chen
Abstract:
In response to the need for rapid and accurate COVID-19 diagnosis during the global pandemic, we present a two-stage framework that leverages pseudo labels for domain adaptation to enhance the detection of COVID-19 from CT scans. By utilizing annotated data from one domain and non-annotated data from another, the model overcomes the challenge of data scarcity and variability, common in emergent he…
▽ More
In response to the need for rapid and accurate COVID-19 diagnosis during the global pandemic, we present a two-stage framework that leverages pseudo labels for domain adaptation to enhance the detection of COVID-19 from CT scans. By utilizing annotated data from one domain and non-annotated data from another, the model overcomes the challenge of data scarcity and variability, common in emergent health crises. The innovative approach of generating pseudo labels enables the model to iteratively refine its learning process, thereby improving its accuracy and adaptability across different hospitals and medical centres. Experimental results on COV19-CT-DB database showcase the model's potential to achieve high diagnostic precision, significantly contributing to efficient patient management and alleviating the strain on healthcare systems. Our method achieves 0.92 Macro F1 Score on the validation set of Covid-19 domain adaptation challenge.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding
Authors:
Yue Fan,
Xiaojian Ma,
Rujie Wu,
Yuntao Du,
Jiaqi Li,
Zhi Gao,
Qing Li
Abstract:
We explore how reconciling several foundation models (large language models and vision-language models) with a novel unified memory mechanism could tackle the challenging video understanding problem, especially capturing the long-term temporal relations in lengthy videos. In particular, the proposed multimodal agent VideoAgent: 1) constructs a structured memory to store both the generic temporal e…
▽ More
We explore how reconciling several foundation models (large language models and vision-language models) with a novel unified memory mechanism could tackle the challenging video understanding problem, especially capturing the long-term temporal relations in lengthy videos. In particular, the proposed multimodal agent VideoAgent: 1) constructs a structured memory to store both the generic temporal event descriptions and object-centric tracking states of the video; 2) given an input task query, it employs tools including video segment localization and object memory querying along with other visual foundation models to interactively solve the task, utilizing the zero-shot tool-use ability of LLMs. VideoAgent demonstrates impressive performances on several long-horizon video understanding benchmarks, an average increase of 6.6% on NExT-QA and 26.0% on EgoSchema over baselines, closing the gap between open-sourced models and private counterparts including Gemini 1.5 Pro.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Pencil: Private and Extensible Collaborative Learning without the Non-Colluding Assumption
Authors:
Xuanqi Liu,
Zhuotao Liu,
Qi Li,
Ke Xu,
Mingwei Xu
Abstract:
The escalating focus on data privacy poses significant challenges for collaborative neural network training, where data ownership and model training/deployment responsibilities reside with distinct entities. Our community has made substantial contributions to addressing this challenge, proposing various approaches such as federated learning (FL) and privacy-preserving machine learning based on cry…
▽ More
The escalating focus on data privacy poses significant challenges for collaborative neural network training, where data ownership and model training/deployment responsibilities reside with distinct entities. Our community has made substantial contributions to addressing this challenge, proposing various approaches such as federated learning (FL) and privacy-preserving machine learning based on cryptographic constructs like homomorphic encryption (HE) and secure multiparty computation (MPC). However, FL completely overlooks model privacy, and HE has limited extensibility (confined to only one data provider). While the state-of-the-art MPC frameworks provide reasonable throughput and simultaneously ensure model/data privacy, they rely on a critical non-colluding assumption on the computing servers, and relaxing this assumption is still an open problem.
In this paper, we present Pencil, the first private training framework for collaborative learning that simultaneously offers data privacy, model privacy, and extensibility to multiple data providers, without relying on the non-colluding assumption. Our fundamental design principle is to construct the n-party collaborative training protocol based on an efficient two-party protocol, and meanwhile ensuring that switching to different data providers during model training introduces no extra cost. We introduce several novel cryptographic protocols to realize this design principle and conduct a rigorous security and privacy analysis. Our comprehensive evaluations of Pencil demonstrate that (i) models trained in plaintext and models trained privately using Pencil exhibit nearly identical test accuracies; (ii) The training overhead of Pencil is greatly reduced: Pencil achieves 10 ~ 260x higher throughput and 2 orders of magnitude less communication than prior art; (iii) Pencil is resilient against both existing and adaptive (white-box) attacks.
△ Less
Submitted 17 March, 2024;
originally announced March 2024.
-
Brain-on-Switch: Towards Advanced Intelligent Network Data Plane via NN-Driven Traffic Analysis at Line-Speed
Authors:
**zhu Yan,
Haotian Xu,
Zhuotao Liu,
Qi Li,
Ke Xu,
Mingwei Xu,
Jian** Wu
Abstract:
The emerging programmable networks sparked significant research on Intelligent Network Data Plane (INDP), which achieves learning-based traffic analysis at line-speed. Prior art in INDP focus on deploying tree/forest models on the data plane. We observe a fundamental limitation in tree-based INDP approaches: although it is possible to represent even larger tree/forest tables on the data plane, the…
▽ More
The emerging programmable networks sparked significant research on Intelligent Network Data Plane (INDP), which achieves learning-based traffic analysis at line-speed. Prior art in INDP focus on deploying tree/forest models on the data plane. We observe a fundamental limitation in tree-based INDP approaches: although it is possible to represent even larger tree/forest tables on the data plane, the flow features that are computable on the data plane are fundamentally limited by hardware constraints. In this paper, we present BoS to push the boundaries of INDP by enabling Neural Network (NN) driven traffic analysis at line-speed. Many types of NNs (such as Recurrent Neural Network (RNN), and transformers) that are designed to work with sequential data have advantages over tree-based models, because they can take raw network data as input without complex feature computations on the fly. However, the challenge is significant: the recurrent computation scheme used in RNN inference is fundamentally different from the match-action paradigm used on the network data plane. BoS addresses this challenge by (i) designing a novel data plane friendly RNN architecture that can execute unlimited RNN time steps with limited data plane stages, effectively achieving line-speed RNN inference; and (ii) complementing the on-switch RNN model with an off-switch transformer-based traffic analysis module to further boost the overall performance. We implement a prototype of BoS using a P4 programmable switch as our data plane, and extensively evaluate it over multiple traffic analysis tasks. The results show that BoS outperforms state-of-the-art in both analysis accuracy and scalability.
△ Less
Submitted 17 March, 2024;
originally announced March 2024.
-
Incorporating Higher-order Structural Information for Graph Clustering
Authors:
Qiankun Li,
Haobing Liu,
Ruobing Jiang,
Tingting Wang
Abstract:
Clustering holds profound significance in data mining. In recent years, graph convolutional network (GCN) has emerged as a powerful tool for deep clustering, integrating both graph structural information and node attributes. However, most existing methods ignore the higher-order structural information of the graph. Evidently, nodes within the same cluster can establish distant connections. Besides…
▽ More
Clustering holds profound significance in data mining. In recent years, graph convolutional network (GCN) has emerged as a powerful tool for deep clustering, integrating both graph structural information and node attributes. However, most existing methods ignore the higher-order structural information of the graph. Evidently, nodes within the same cluster can establish distant connections. Besides, recent deep clustering methods usually apply a self-supervised module to monitor the training process of their model, focusing solely on node attributes without paying attention to graph structure. In this paper, we propose a novel graph clustering network to make full use of graph structural information. To capture the higher-order structural information, we design a graph mutual infomax module, effectively maximizing mutual information between graph-level and node-level representations, and employ a trinary self-supervised module that includes modularity as a structural constraint. Our proposed model outperforms many state-of-the-art methods on various datasets, demonstrating its superiority.
△ Less
Submitted 19 March, 2024; v1 submitted 17 March, 2024;
originally announced March 2024.
-
Test of lepton universality and measurement of the form factors of $D^0\to K^{*}(892)^-μ^+ν_μ$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (637 additional authors not shown)
Abstract:
We report a first study of the semileptonic decay $D^0\rightarrow K^-π^0μ^{+}ν_μ$ by analyzing an $e^+e^-$ annihilation data sample of $7.9~\mathrm{fb}^{-1}$ collected at the center-of-mass energy of 3.773 GeV with the BESIII detector. The absolute branching fraction of $D^0\to K^-π^0μ^{+}ν_μ$ is measured for the first time to be $(0.729 \pm 0.014_{\rm stat} \pm 0.011_{\rm syst})\%$. Based on an a…
▽ More
We report a first study of the semileptonic decay $D^0\rightarrow K^-π^0μ^{+}ν_μ$ by analyzing an $e^+e^-$ annihilation data sample of $7.9~\mathrm{fb}^{-1}$ collected at the center-of-mass energy of 3.773 GeV with the BESIII detector. The absolute branching fraction of $D^0\to K^-π^0μ^{+}ν_μ$ is measured for the first time to be $(0.729 \pm 0.014_{\rm stat} \pm 0.011_{\rm syst})\%$. Based on an amplitude analysis, the $S\text{-}{\rm wave}$ contribution is determined to be $(5.76 \pm 0.35_{\rm stat} \pm 0.29_{\rm syst})\%$ of the total decay rate in addition to the dominated $K^{*}(892)^-$ component. The branching fraction of $D^0\to K^{*}(892)^-μ^+ν_μ$ is given to be $(2.062 \pm 0.039_{\rm stat} \pm 0.032_{\rm syst})\%$, which improves the precision of the world average by a factor of 5. Combining with the world average of ${\mathcal B}(D^0\to K^{*}(892)^-e^+ν_e)$, the ratio of the branching fractions obtained is $\frac{{\mathcal B}(D^0\to K^{*}(892)^-μ^+ν_μ)}{{\mathcal B}(D^0\to K^{*}(892)^-e^+ν_e)} = 0.96\pm0.08$, in agreement with lepton flavor universality. Furthermore, assuming single-pole dominance parameterization, the most precise hadronic form factor ratios for $D^0\to K^{*}(892)^{-} μ^+ν_μ$ are extracted to be $r_{V}=V(0)/A_1(0)=1.37 \pm 0.09_{\rm stat} \pm 0.03_{\rm syst}$ and $r_{2}=A_2(0)/A_1(0)=0.76 \pm 0.06_{\rm stat} \pm 0.02_{\rm syst}$.
△ Less
Submitted 16 March, 2024;
originally announced March 2024.
-
CSST large-scale structure analysis pipeline: I. constructing reference mock galaxy redshift surveys
Authors:
Yizhou Gu,
Xiaohu Yang,
Jiaxin Han,
Yirong Wang,
Qingyang Li,
Zhenlin Tan,
Wenkang Jiang,
Yaru Wang,
Jiaqi Wang,
Antonios Katsianis,
Xiaoju Xu,
Haojie Xu,
Wensheng Hong,
Houjun Mo,
Run Wen,
Xianzhong Zheng,
Feng Shi,
Pengjie Zhang,
Zhongxu Zhai,
Chengze Liu,
Wenting Wang,
Ying Zu,
Hong Guo,
Youcai Zhang,
Yi Lu
, et al. (7 additional authors not shown)
Abstract:
In this paper, we set out to construct a set of reference mock galaxy redshift surveys (MGRSs) for the future Chinese Space-station Survey Telescope (CSST) observation, where subsequent survey selection effects can be added and evaluated. This set of MGRSs is generated using the dark matter subhalos extracted from a high-resolution Jiutian $N$-body simulation of the standard $Λ$CDM cosmogony with…
▽ More
In this paper, we set out to construct a set of reference mock galaxy redshift surveys (MGRSs) for the future Chinese Space-station Survey Telescope (CSST) observation, where subsequent survey selection effects can be added and evaluated. This set of MGRSs is generated using the dark matter subhalos extracted from a high-resolution Jiutian $N$-body simulation of the standard $Λ$CDM cosmogony with $Ω_m=0.3111$, $Ω_Λ=0.6889$, and $σ_8=0.8102$. The simulation has a boxsize of $1~h^{-1} {\rm Gpc}$, and consists of $6144^3$ particles with mass resolution $3.723 \times 10^{8} h^{-1} M_\odot $. In order to take into account the effect of redshift evolution, we first use all 128 snapshots in the Jiutian simulation to generate a light-cone halo/subhalo catalog. Next, galaxy luminosities are assigned to the main and subhalo populations using the subhalo abundance matching (SHAM) method with the DESI $z$-band luminosity functions at different redshifts. Multi-band photometries, as well as images, are then assigned to each mock galaxy using a 3-dimensional parameter space nearest neighbor sampling of the DESI LS observational galaxies and groups. Finally, the CSST and DESI LS survey geometry and magnitude limit cuts are applied to generate the required MGRSs. As we have checked, this set of MGRSs can generally reproduce the observed galaxy luminosity/mass functions within 0.1 dex for galaxies with $L > 10^8 L_\odot$ (or $M_* > 10^{8.5} M_\odot$) and within 1-$σ$ level for galaxies with $L < 10^8L_\odot$ (or $M_* < 10^{8.5} M_\odot$). Together with the CSST slitless spectra and redshifts for our DESI LS seed galaxies that are under construction, we will set out to test various slitless observational selection effects in subsequent probes.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
Joint Optimization for Achieving Covertness in MIMO Over-the-Air Computation Networks
Authors:
Junteng Yao,
Tuo Wu,
Ming **,
Cunhua Pan,
Quanzhong Li,
**hong Yuan
Abstract:
This paper investigates covert data transmission within a multiple-input multiple-output (MIMO) over-the-air computation (AirComp) network, where sensors transmit data to the access point (AP) while guaranteeing covertness to the warden (Willie). Simultaneously, the AP introduces artificial noise (AN) to confuse Willie, meeting the covert requirement. We address the challenge of minimizing mean-sq…
▽ More
This paper investigates covert data transmission within a multiple-input multiple-output (MIMO) over-the-air computation (AirComp) network, where sensors transmit data to the access point (AP) while guaranteeing covertness to the warden (Willie). Simultaneously, the AP introduces artificial noise (AN) to confuse Willie, meeting the covert requirement. We address the challenge of minimizing mean-square-error (MSE) of the AP, while considering transmit power constraints at both the AP and the sensors, as well as ensuring the covert transmission to Willie with a low detection error probability (DEP). However, obtaining globally optimal solutions for the investigated non-convex problem is challenging due to the interdependence of optimization variables. To tackle this problem, we introduce an exact penalty algorithm and transform the optimization problem into a difference-of-convex (DC) form problem to find a locally optimal solution. Simulation results showcase the superior performance in terms of our proposed scheme in comparison to the benchmark schemes.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
Cardiac Magnetic Resonance 2D+T Short- and Long-axis Segmentation via Spatio-temporal SAM Adaptation
Authors:
Zhennong Chen,
Sekeun Kim,
Hui Ren,
Quanzheng Li,
Xiang Li
Abstract:
Accurate 2D+T myocardium segmentation in cine cardiac magnetic resonance (CMR) scans is essential to analyze LV motion throughout the cardiac cycle comprehensively. The Segment Anything Model (SAM), known for its accurate segmentation and zero-shot generalization, has not yet been tailored for CMR 2D+T segmentation. We therefore introduce CMR2D+T-SAM, a novel approach to adapt SAM for CMR 2D+T seg…
▽ More
Accurate 2D+T myocardium segmentation in cine cardiac magnetic resonance (CMR) scans is essential to analyze LV motion throughout the cardiac cycle comprehensively. The Segment Anything Model (SAM), known for its accurate segmentation and zero-shot generalization, has not yet been tailored for CMR 2D+T segmentation. We therefore introduce CMR2D+T-SAM, a novel approach to adapt SAM for CMR 2D+T segmentation using spatio-temporal adaption. This approach also incorporates a U-Net framework for multi-scale feature extraction, as well as text prompts for accurate segmentation on both short-axis (SAX) and long-axis (LAX) views using a single model. CMR2D+T-SAM outperforms existing deep learning methods on the STACOM2011 dataset, achieving a myocardium Dice score of 0.885 and a Hausdorff distance (HD) of 2.900 pixels. It also demonstrates superior zero-shot generalization on the ACDC dataset with a Dice score of 0.840 and a HD of 4.076 pixels.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
Tracking of charged particles with nanosecond lifetimes at LHCb
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
J. A. Adams,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey
, et al. (1060 additional authors not shown)
Abstract:
A method is presented to reconstruct charged particles with lifetimes between 10 ps and 10 ns, which considers a combination of their decay products and the partial tracks created by the initial charged particle. Using the $Ξ^-$ baryon as a benchmark, the method is demonstrated with simulated events and proton-proton collision data at $\sqrt{s}=13$ TeV, corresponding to an integrated luminosity of…
▽ More
A method is presented to reconstruct charged particles with lifetimes between 10 ps and 10 ns, which considers a combination of their decay products and the partial tracks created by the initial charged particle. Using the $Ξ^-$ baryon as a benchmark, the method is demonstrated with simulated events and proton-proton collision data at $\sqrt{s}=13$ TeV, corresponding to an integrated luminosity of 2.0 fb${}^{-1}$ collected with the LHCb detector in 2018. Significant improvements in the angular resolution and the signal purity are obtained. The method is implemented as part of the LHCb Run 3 event trigger in a set of requirements to select detached hyperons. This is the first demonstration of the applicability of this approach at the LHC, and the first to show its scaling with instantaneous luminosity.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
Anatomical Structure-Guided Medical Vision-Language Pre-training
Authors:
Qingqiu Li,
Xiaohan Yan,
Jilan Xu,
Runtian Yuan,
Yuejie Zhang,
Rui Feng,
Quanli Shen,
Xiaobo Zhang,
Shujun Wang
Abstract:
Learning medical visual representations through vision-language pre-training has reached remarkable progress. Despite the promising performance, it still faces challenges, i.e., local alignment lacks interpretability and clinical relevance, and the insufficient internal and external representation learning of image-report pairs. To address these issues, we propose an Anatomical Structure-Guided (A…
▽ More
Learning medical visual representations through vision-language pre-training has reached remarkable progress. Despite the promising performance, it still faces challenges, i.e., local alignment lacks interpretability and clinical relevance, and the insufficient internal and external representation learning of image-report pairs. To address these issues, we propose an Anatomical Structure-Guided (ASG) framework. Specifically, we parse raw reports into triplets <anatomical region, finding, existence>, and fully utilize each element as supervision to enhance representation learning. For anatomical region, we design an automatic anatomical region-sentence alignment paradigm in collaboration with radiologists, considering them as the minimum semantic units to explore fine-grained local alignment. For finding and existence, we regard them as image tags, applying an image-tag recognition decoder to associate image features with their respective tags within each sample and constructing soft labels for contrastive learning to improve the semantic association of different image-report pairs. We evaluate the proposed ASG framework on two downstream tasks, including five public benchmarks. Experimental results demonstrate that our method outperforms the state-of-the-art methods.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
Origin of light-induced metastability in ZrTe$_5$
Authors:
D. Nevola,
N. Aryal,
G. D. Gu,
P. D. Johnson,
W. -G. Yin,
Q. Li
Abstract:
We study the non-equilibrium electronic structure of a model Dirac semimetal ZrTe$_5$ by using time-and-angle resolved photoemission spectroscopy and density functional theory-based electron and phonon calculations. By measuring the electronic dispersion near the $Γ$ point at time delays up to 10 picoseconds, we discovered that the band spectral weight does not recover during the measured temporal…
▽ More
We study the non-equilibrium electronic structure of a model Dirac semimetal ZrTe$_5$ by using time-and-angle resolved photoemission spectroscopy and density functional theory-based electron and phonon calculations. By measuring the electronic dispersion near the $Γ$ point at time delays up to 10 picoseconds, we discovered that the band spectral weight does not recover during the measured temporal window, revealing the existence of light induced metastable state in the electronic structure of this material. Our calculations find that the photoexcited $A_{1g}$ phonon mode lead to a band renormalization that both supports our experimental observations at the zone center and predicts changes to the band structure outside of our experimental window, ultimately showing the evolution from a direct to an indirect gap semimetal; such band renormalization dramatically reduces the electron-hole recombination rate giving rise to the metastability in this system.
△ Less
Submitted 13 March, 2024;
originally announced March 2024.
-
Model Will Tell: Training Membership Inference for Diffusion Models
Authors:
Xiaomeng Fu,
Xi Wang,
Qiao Li,
** Liu,
Jiao Dai,
Jizhong Han
Abstract:
Diffusion models pose risks of privacy breaches and copyright disputes, primarily stemming from the potential utilization of unauthorized data during the training phase. The Training Membership Inference (TMI) task aims to determine whether a specific sample has been used in the training process of a target model, representing a critical tool for privacy violation verification. However, the increa…
▽ More
Diffusion models pose risks of privacy breaches and copyright disputes, primarily stemming from the potential utilization of unauthorized data during the training phase. The Training Membership Inference (TMI) task aims to determine whether a specific sample has been used in the training process of a target model, representing a critical tool for privacy violation verification. However, the increased stochasticity inherent in diffusion renders traditional shadow-model-based or metric-based methods ineffective when applied to diffusion models. Moreover, existing methods only yield binary classification labels which lack necessary comprehensibility in practical applications. In this paper, we explore a novel perspective for the TMI task by leveraging the intrinsic generative priors within the diffusion model. Compared with unseen samples, training samples exhibit stronger generative priors within the diffusion model, enabling the successful reconstruction of substantially degraded training images. Consequently, we propose the Degrade Restore Compare (DRC) framework. In this framework, an image undergoes sequential degradation and restoration, and its membership is determined by comparing it with the restored counterpart. Experimental results verify that our approach not only significantly outperforms existing methods in terms of accuracy but also provides comprehensible decision criteria, offering evidence for potential privacy violations.
△ Less
Submitted 13 March, 2024;
originally announced March 2024.
-
Rediscovering BCE Loss for Uniform Classification
Authors:
Qiufu Li,
Xi Jia,
Jiancan Zhou,
Linlin Shen,
**ming Duan
Abstract:
This paper introduces the concept of uniform classification, which employs a unified threshold to classify all samples rather than adaptive threshold classifying each individual sample. We also propose the uniform classification accuracy as a metric to measure the model's performance in uniform classification. Furthermore, begin with a naive loss, we mathematically derive a loss function suitable…
▽ More
This paper introduces the concept of uniform classification, which employs a unified threshold to classify all samples rather than adaptive threshold classifying each individual sample. We also propose the uniform classification accuracy as a metric to measure the model's performance in uniform classification. Furthermore, begin with a naive loss, we mathematically derive a loss function suitable for the uniform classification, which is the BCE function integrated with a unified bias. We demonstrate the unified threshold could be learned via the bias. The extensive experiments on six classification datasets and three feature extraction models show that, compared to the SoftMax loss, the models trained with the BCE loss not only exhibit higher uniform classification accuracy but also higher sample-wise classification accuracy. In addition, the learned bias from BCE loss is very close to the unified threshold used in the uniform classification. The features extracted by the models trained with BCE loss not only possess uniformity but also demonstrate better intra-class compactness and inter-class distinctiveness, yielding superior performance on open-set tasks such as face recognition.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
Conditional Score-Based Diffusion Model for Cortical Thickness Trajectory Prediction
Authors:
Qing Xiao,
Siyeop Yoon,
Hui Ren,
Matthew Tivnan,
Lichao Sun,
Quanzheng Li,
Tianming Liu,
Yu Zhang,
Xiang Li
Abstract:
Alzheimer's Disease (AD) is a neurodegenerative condition characterized by diverse progression rates among individuals, with changes in cortical thickness (CTh) closely linked to its progression. Accurately forecasting CTh trajectories can significantly enhance early diagnosis and intervention strategies, providing timely care. However, the longitudinal data essential for these studies often suffe…
▽ More
Alzheimer's Disease (AD) is a neurodegenerative condition characterized by diverse progression rates among individuals, with changes in cortical thickness (CTh) closely linked to its progression. Accurately forecasting CTh trajectories can significantly enhance early diagnosis and intervention strategies, providing timely care. However, the longitudinal data essential for these studies often suffer from temporal sparsity and incompleteness, presenting substantial challenges in modeling the disease's progression accurately. Existing methods are limited, focusing primarily on datasets without missing entries or requiring predefined assumptions about CTh progression. To overcome these obstacles, we propose a conditional score-based diffusion model specifically designed to generate CTh trajectories with the given baseline information, such as age, sex, and initial diagnosis. Our conditional diffusion model utilizes all available data during the training phase to make predictions based solely on baseline information during inference without needing prior history about CTh progression. The prediction accuracy of the proposed CTh prediction pipeline using a conditional score-based model was compared for sub-groups consisting of cognitively normal, mild cognitive impairment, and AD subjects. The Bland-Altman analysis shows our diffusion-based prediction model has a near-zero bias with narrow 95% confidential interval compared to the ground-truth CTh in 6-36 months. In addition, our conditional diffusion model has a stochastic generative nature, therefore, we demonstrated an uncertainty analysis of patient-specific CTh prediction through multiple realizations.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
A Holistic Framework Towards Vision-based Traffic Signal Control with Microscopic Simulation
Authors:
Pan He,
Quanyi Li,
Xiaoyong Yuan,
Bolei Zhou
Abstract:
Traffic signal control (TSC) is crucial for reducing traffic congestion that leads to smoother traffic flow, reduced idling time, and mitigated CO2 emissions. In this study, we explore the computer vision approach for TSC that modulates on-road traffic flows through visual observation. Unlike traditional feature-based approaches, vision-based methods depend much less on heuristics and predefined f…
▽ More
Traffic signal control (TSC) is crucial for reducing traffic congestion that leads to smoother traffic flow, reduced idling time, and mitigated CO2 emissions. In this study, we explore the computer vision approach for TSC that modulates on-road traffic flows through visual observation. Unlike traditional feature-based approaches, vision-based methods depend much less on heuristics and predefined features, bringing promising potentials for end-to-end learning and optimization of traffic signals. Thus, we introduce a holistic traffic simulation framework called TrafficDojo towards vision-based TSC and its benchmarking by integrating the microscopic traffic flow provided in SUMO into the driving simulator MetaDrive. This proposed framework offers a versatile traffic environment for in-depth analysis and comprehensive evaluation of traffic signal controllers across diverse traffic conditions and scenarios. We establish and compare baseline algorithms including both traditional and Reinforecment Learning (RL) approaches. This work sheds insights into the design and development of vision-based TSC approaches and open up new research opportunities. All the code and baselines will be made publicly available.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
Medical Image Synthesis via Fine-Grained Image-Text Alignment and Anatomy-Pathology Prompting
Authors:
Wenting Chen,
Pengyu Wang,
Hui Ren,
Lichao Sun,
Quanzheng Li,
Yixuan Yuan,
Xiang Li
Abstract:
Data scarcity and privacy concerns limit the availability of high-quality medical images for public use, which can be mitigated through medical image synthesis. However, current medical image synthesis methods often struggle to accurately capture the complexity of detailed anatomical structures and pathological conditions. To address these challenges, we propose a novel medical image synthesis mod…
▽ More
Data scarcity and privacy concerns limit the availability of high-quality medical images for public use, which can be mitigated through medical image synthesis. However, current medical image synthesis methods often struggle to accurately capture the complexity of detailed anatomical structures and pathological conditions. To address these challenges, we propose a novel medical image synthesis model that leverages fine-grained image-text alignment and anatomy-pathology prompts to generate highly detailed and accurate synthetic medical images. Our method integrates advanced natural language processing techniques with image generative modeling, enabling precise alignment between descriptive text prompts and the synthesized images' anatomical and pathological details. The proposed approach consists of two key components: an anatomy-pathology prompting module and a fine-grained alignment-based synthesis module. The anatomy-pathology prompting module automatically generates descriptive prompts for high-quality medical images. To further synthesize high-quality medical images from the generated prompts, the fine-grained alignment-based synthesis module pre-defines a visual codebook for the radiology dataset and performs fine-grained alignment between the codebook and generated prompts to obtain key patches as visual clues, facilitating accurate image synthesis. We validate the superiority of our method through experiments on public chest X-ray datasets and demonstrate that our synthetic images preserve accurate semantic information, making them valuable for various medical applications.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
Determination of the number of $ψ(3686)$ events taken at BESIII
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (639 additional authors not shown)
Abstract:
The number of $ψ(3686)$ events collected by the BESIII detector during the 2021 run period is determined to be $(2259.3\pm 11.1)\times 10^6$ by counting inclusive $ψ(3686)$ hadronic events. The uncertainty is systematic and the statistical uncertainty is negligible. Meanwhile, the numbers of $ψ(3686)$ events collected during the 2009 and 2012 run periods are updated to be…
▽ More
The number of $ψ(3686)$ events collected by the BESIII detector during the 2021 run period is determined to be $(2259.3\pm 11.1)\times 10^6$ by counting inclusive $ψ(3686)$ hadronic events. The uncertainty is systematic and the statistical uncertainty is negligible. Meanwhile, the numbers of $ψ(3686)$ events collected during the 2009 and 2012 run periods are updated to be $(107.7\pm0.6)\times 10^6$ and $(345.4\pm 2.6)\times 10^6$, respectively. Both numbers are consistent with the previous measurements within one standard deviation. The total number of $ψ(3686)$ events in the three data samples is $(2712.4\pm14.3)\times10^6$.
△ Less
Submitted 28 May, 2024; v1 submitted 11 March, 2024;
originally announced March 2024.