-
Red Teaming Language Models for Contradictory Dialogues
Authors:
Xiaofei Wen,
Bangzheng Li,
Tenghao Huang,
Muhao Chen
Abstract:
Most language models currently available are prone to self-contradiction during dialogues. To mitigate this issue, this study explores a novel contradictory dialogue processing task that aims to detect and modify contradictory statements in a conversation. This task is inspired by research on context faithfulness and dialogue comprehension, which have demonstrated that the detection and understand…
▽ More
Most language models currently available are prone to self-contradiction during dialogues. To mitigate this issue, this study explores a novel contradictory dialogue processing task that aims to detect and modify contradictory statements in a conversation. This task is inspired by research on context faithfulness and dialogue comprehension, which have demonstrated that the detection and understanding of contradictions often necessitate detailed explanations. We develop a dataset comprising contradictory dialogues, in which one side of the conversation contradicts itself. Each dialogue is accompanied by an explanatory label that highlights the location and details of the contradiction. With this dataset, we present a Red Teaming framework for contradictory dialogue processing. The framework detects and attempts to explain the dialogue, then modifies the existing contradictory content using the explanation. Our experiments demonstrate that the framework improves the ability to detect contradictory dialogues and provides valid explanations. Additionally, it showcases distinct capabilities for modifying such dialogues. Our study highlights the importance of the logical inconsistency problem in conversational AI.
△ Less
Submitted 16 May, 2024; v1 submitted 16 May, 2024;
originally announced May 2024.
-
Performance testing of a novel short axis photomultiplier tube for the HUNT project
Authors:
Yijiang Peng,
Zike Wang,
Bo Gao,
Yiyue Tang,
Mingjun Chen,
Kai Li,
Ling Ren,
Xiaohao You,
Maoyuan Liu
Abstract:
Photomultiplier tubes (PMTs) with large-area cathodes are increasingly being used in cosmic-ray experiments to enhance detection efficiency. The optical modules (OMs) of the High-Energy Underwater Neutrino Telescope (HUNT) have employed a brand new N6205 20-inch microchannel plate photomultiplier tube (MCP-PMT) developed by the North Night Vision Science & Technology (Nan**g) Research Institute C…
▽ More
Photomultiplier tubes (PMTs) with large-area cathodes are increasingly being used in cosmic-ray experiments to enhance detection efficiency. The optical modules (OMs) of the High-Energy Underwater Neutrino Telescope (HUNT) have employed a brand new N6205 20-inch microchannel plate photomultiplier tube (MCP-PMT) developed by the North Night Vision Science & Technology (Nan**g) Research Institute Co. Ltd. (NNVT). In order to make the 20-inch PMT fit into the 23-inch diameter pressure-resistant glass sphere, NNVT improved the internal structure of PMT and shortened the height of PMT by more than 10~cm. The first batch of these PMTs has been delivered for preliminary research work. This paper describes a specific PMT testing platform built for the first batch of 15 MCP-PMTs, and some performance parameters of PMT, such as P/V ratio, TTS and nonliniearity, are measured.The measurement results show that the new PMT still has good performance and can meet the requirements of HUNT project.
△ Less
Submitted 16 May, 2024;
originally announced May 2024.
-
Search for the leptonic decays $D^{*+}\to e^+ν_e$ and $D^{*+}\to μ^+ν_μ$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
M. Albrecht,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
Y. Bai,
O. Bakina,
R. Baldini Ferroli,
I. Balossino,
Y. Ban,
V. Batozskaya,
D. Becker,
K. Begzsuren,
N. Berger,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
J. Bloms,
A. Bortone,
I. Boyko
, et al. (559 additional authors not shown)
Abstract:
We present the first search for the leptonic decays $D^{*+}\to e^+ν_e$ and $D^{*+}\to μ^+ν_μ$ by analyzing a data sample of electron-positron collisions recorded with the BESIII detector at center-of-mass energies between 4.178 and 4.226 GeV, corresponding to an integrated luminosity of 6.32~fb$^{-1}$. No significant signal is observed. The upper limits on the branching fractions for…
▽ More
We present the first search for the leptonic decays $D^{*+}\to e^+ν_e$ and $D^{*+}\to μ^+ν_μ$ by analyzing a data sample of electron-positron collisions recorded with the BESIII detector at center-of-mass energies between 4.178 and 4.226 GeV, corresponding to an integrated luminosity of 6.32~fb$^{-1}$. No significant signal is observed. The upper limits on the branching fractions for $D^{*+}\to e^+ν_e$ and $D^{*+}\to μ^+ν_μ$ are set to be $1.1 \times 10^{-5}$ and $4.3 \times 10^{-6}$ at 90\% confidence level, respectively.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
3D Shape Augmentation with Content-Aware Shape Resizing
Authors:
Mingxiang Chen,
Jian Zhang,
Boli Zhou,
Yang Song
Abstract:
Recent advancements in deep learning for 3D models have propelled breakthroughs in generation, detection, and scene understanding. However, the effectiveness of these algorithms hinges on large training datasets. We address the challenge by introducing Efficient 3D Seam Carving (E3SC), a novel 3D model augmentation method based on seam carving, which progressively deforms only part of the input mo…
▽ More
Recent advancements in deep learning for 3D models have propelled breakthroughs in generation, detection, and scene understanding. However, the effectiveness of these algorithms hinges on large training datasets. We address the challenge by introducing Efficient 3D Seam Carving (E3SC), a novel 3D model augmentation method based on seam carving, which progressively deforms only part of the input model while ensuring the overall semantics are unchanged. Experiments show that our approach is capable of producing diverse and high-quality augmented 3D shapes across various types and styles of input models, achieving considerable improvements over previous methods. Quantitative evaluations demonstrate that our method effectively enhances the novelty and quality of shapes generated by other subsequent 3D generation algorithms.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
Charge-Transfer Hyperbolic Polaritons in $α$-MoO$_3$/graphene heterostructures
Authors:
J. Shen,
M. Chen,
V. Korostelev,
H. Kim,
P. Fathi-Hafshejani,
M. Mahjouri-Samani,
K. Klyukin,
G-H. Lee,
S. Dai
Abstract:
Charge transfer is a fundamental interface process that can be harnessed for light detection, photovoltaics, and photosynthesis. Recently, charge transfer was exploited in nanophotonics to alter plasmon polaritons by involving additional non-polaritonic materials to activate the charge transfer. Yet, direct charge transfer between polaritonic materials hasn't been demonstrated. We report the direc…
▽ More
Charge transfer is a fundamental interface process that can be harnessed for light detection, photovoltaics, and photosynthesis. Recently, charge transfer was exploited in nanophotonics to alter plasmon polaritons by involving additional non-polaritonic materials to activate the charge transfer. Yet, direct charge transfer between polaritonic materials hasn't been demonstrated. We report the direct charge transfer in pure polaritonic van der Waals (vdW) heterostructures of $α$-MoO$_3$/graphene. We extracted the Fermi energy of 0.6 eV for graphene by infrared nano-imaging of charge transfer hyperbolic polaritons in the vdW heterostructure. This unusually high Fermi energy is attributed to the charge transfer between graphene and $α$-MoO$_3$. Moreover, we have observed charge transfer hyperbolic polaritons in multiple energy-momentum dispersion branches with a wavelength elongation of up to 150%. With support from the DFT calculation, we find that the charge transfer between graphene and $α$-MoO$_3$, absent in mechanically assembled vdW heterostructures, is attributed to the relatively pristine heterointerface preserved in the epitaxially grown vdW heterostructure. The direct charge transfer and charge transfer hyperbolic polaritons demonstrated in our work hold great promise for develo** nano-optical circuits, computational devices, communication systems, and light and energy manipulation devices.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
Authors:
Zhimin Li,
Jianwei Zhang,
Qin Lin,
Jiangfeng Xiong,
Yanxin Long,
Xinchi Deng,
Yingfang Zhang,
Xingchao Liu,
Minbin Huang,
Zedong Xiao,
Dayou Chen,
Jiajun He,
Jiahao Li,
Wenyue Li,
Chen Zhang,
Rongwei Quan,
Jianxiang Lu,
Jiabin Huang,
Xiaoyan Yuan,
Xiaoxiao Zheng,
Yixuan Li,
Jihong Zhang,
Chao Zhang,
Meng Chen,
Jie Liu
, et al. (20 additional authors not shown)
Abstract:
We present Hunyuan-DiT, a text-to-image diffusion transformer with fine-grained understanding of both English and Chinese. To construct Hunyuan-DiT, we carefully design the transformer structure, text encoder, and positional encoding. We also build from scratch a whole data pipeline to update and evaluate data for iterative model optimization. For fine-grained language understanding, we train a Mu…
▽ More
We present Hunyuan-DiT, a text-to-image diffusion transformer with fine-grained understanding of both English and Chinese. To construct Hunyuan-DiT, we carefully design the transformer structure, text encoder, and positional encoding. We also build from scratch a whole data pipeline to update and evaluate data for iterative model optimization. For fine-grained language understanding, we train a Multimodal Large Language Model to refine the captions of the images. Finally, Hunyuan-DiT can perform multi-turn multimodal dialogue with users, generating and refining images according to the context. Through our holistic human evaluation protocol with more than 50 professional human evaluators, Hunyuan-DiT sets a new state-of-the-art in Chinese-to-image generation compared with other open-source models. Code and pretrained models are publicly available at github.com/Tencent/HunyuanDiT
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
Search for the radiative transition $χ_{c1}(3872)\toγψ_2(3823)$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko
, et al. (635 additional authors not shown)
Abstract:
Using 9.0 $\rm fb^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies from 4.178 to 4.278 GeV with the BESIII detector at the BEPCII collider, we perform the first search for the radiative transition $χ_{c1}(3872)\toγψ_2(3823)$. No $χ_{c1}(3872)\toγψ_2(3823)$ signal is observed. The upper limit on the ratio of branching fractions…
▽ More
Using 9.0 $\rm fb^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies from 4.178 to 4.278 GeV with the BESIII detector at the BEPCII collider, we perform the first search for the radiative transition $χ_{c1}(3872)\toγψ_2(3823)$. No $χ_{c1}(3872)\toγψ_2(3823)$ signal is observed. The upper limit on the ratio of branching fractions $\mathcal{B}(χ_{c1}(3872)\toγψ_2(3823), ψ_2(3823)\toγχ_{c1})/\mathcal{B}(χ_{c1}(3872)\toπ^+π^- J/ψ)$ is set as 0.075 at the 90\% confidence level. Our result contradicts theoretical predictions under the assumption that the $χ_{c1}(3872)$ is the pure charmonium state $χ_{c1}(2P)$.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Discovery of Very-high-energy Gamma-ray Emissions from the Low Luminosity AGN NGC 4278 by LHAASO
Authors:
Zhen Cao,
F. Aharonian,
Q. An,
Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
J. T. Cai,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. H. Chen,
S. Z. Chen
, et al. (255 additional authors not shown)
Abstract:
The first source catalog of Large High Altitude Air Shower Observatory reported the detection of a very-high-energy gamma ray source, 1LHAASO J1219+2915. In this paper a further detailed study of the spectral and temporal behavior of this point-like source have been carried. The best-fit position of the TeV source ($\rm{RA}=185.05^{\circ}\pm0.04^{\circ}$, $\rm{Dec}=29.25^{\circ}\pm0.03^{\circ}$) i…
▽ More
The first source catalog of Large High Altitude Air Shower Observatory reported the detection of a very-high-energy gamma ray source, 1LHAASO J1219+2915. In this paper a further detailed study of the spectral and temporal behavior of this point-like source have been carried. The best-fit position of the TeV source ($\rm{RA}=185.05^{\circ}\pm0.04^{\circ}$, $\rm{Dec}=29.25^{\circ}\pm0.03^{\circ}$) is compatible with NGC 4278 within $\sim0.03$ degree. Variation analysis shows an indication of the variability at a few months level in the TeV band, which is consistent with low frequency observations. Based on these observations, we report the detection of TeV $γ$-ray emissions from this low-luminosity AGN NGC 4278. The observations by LHAASO-WCDA during active period has a significance level of 8.8\,$σ$ with best-fit photon spectral index $\varGamma=2.56\pm0.14$ and a flux $f_{1-10\,\rm{TeV}}=(7.0\pm1.1_{\rm{sta}}\pm0.35_{\rm{syst}})\times10^{-13}\,\rm{photons\,cm^{-2}\,s^{-1}}$, or approximately $5\%$ of the Crab Nebula. The discovery of VHE from NGC 4278 indicates that the compact, weak radio jet can efficiently accelerate particles and emit TeV photons.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Decomposing weather forecasting into advection and convection with neural networks
Authors:
Mengxuan Chen,
Ziqi Yuan,
**xiao Zhang,
Runmin Dong,
Haohuan Fu
Abstract:
Operational weather forecasting models have advanced for decades on both the explicit numerical solvers and the empirical physical parameterization schemes. However, the involved high computational costs and uncertainties in these existing schemes are requiring potential improvements through alternative machine learning methods. Previous works use a unified model to learn the dynamics and physics…
▽ More
Operational weather forecasting models have advanced for decades on both the explicit numerical solvers and the empirical physical parameterization schemes. However, the involved high computational costs and uncertainties in these existing schemes are requiring potential improvements through alternative machine learning methods. Previous works use a unified model to learn the dynamics and physics of the atmospheric model. Contrarily, we propose a simple yet effective machine learning model that learns the horizontal movement in the dynamical core and vertical movement in the physical parameterization separately. By replacing the advection with a graph attention network and the convection with a multi-layer perceptron, our model provides a new and efficient perspective to simulate the transition of variables in atmospheric models. We also assess the model's performance over a 5-day iterative forecasting. Under the same input variables and training methods, our model outperforms existing data-driven methods with a significantly-reduced number of parameters with a resolution of 5.625 deg. Overall, this work aims to contribute to the ongoing efforts that leverage machine learning techniques for improving both the accuracy and efficiency of global weather forecasting.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
Measurement of the ${e}^{+}{e}^{-}\to p \bar{p}π^{0}$ cross section at $\sqrt{s}=2.1000-3.0800$ GeV
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (639 additional authors not shown)
Abstract:
The process $e^{+}e^{-}\to p\bar{p}π^{0}$ is studied at 20 center-of-mass energies ranging from 2.1000 to 3.0800 GeV using 636.8 pb$^{-1}$ of data collected with the BESIII detector operating at the BEPCII collider. The Born cross sections for $e^{+}e^{-}\to p\bar{p}π^{0}$ are measured with high precision. Since the lowest center-of-mass energy, 2.1000 GeV, is less than 90 MeV above the…
▽ More
The process $e^{+}e^{-}\to p\bar{p}π^{0}$ is studied at 20 center-of-mass energies ranging from 2.1000 to 3.0800 GeV using 636.8 pb$^{-1}$ of data collected with the BESIII detector operating at the BEPCII collider. The Born cross sections for $e^{+}e^{-}\to p\bar{p}π^{0}$ are measured with high precision. Since the lowest center-of-mass energy, 2.1000 GeV, is less than 90 MeV above the $p\bar{p}π^0$ energy threshold, we can probe the threshold behavior for this reaction. However, no anomalous threshold enhancement is found in the cross sections for $e^{+}e^{-}\to p\bar{p}π^{0}$.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
Evaluating Adversarial Robustness in the Spatial Frequency Domain
Authors:
Keng-Hsin Liao,
Chin-Yuan Yeh,
Hsi-Wen Chen,
Ming-Syan Chen
Abstract:
Convolutional Neural Networks (CNNs) have dominated the majority of computer vision tasks. However, CNNs' vulnerability to adversarial attacks has raised concerns about deploying these models to safety-critical applications. In contrast, the Human Visual System (HVS), which utilizes spatial frequency channels to process visual signals, is immune to adversarial attacks. As such, this paper presents…
▽ More
Convolutional Neural Networks (CNNs) have dominated the majority of computer vision tasks. However, CNNs' vulnerability to adversarial attacks has raised concerns about deploying these models to safety-critical applications. In contrast, the Human Visual System (HVS), which utilizes spatial frequency channels to process visual signals, is immune to adversarial attacks. As such, this paper presents an empirical study exploring the vulnerability of CNN models in the frequency domain. Specifically, we utilize the discrete cosine transform (DCT) to construct the Spatial-Frequency (SF) layer to produce a block-wise frequency spectrum of an input image and formulate Spatial Frequency CNNs (SF-CNNs) by replacing the initial feature extraction layers of widely-used CNN backbones with the SF layer. Through extensive experiments, we observe that SF-CNN models are more robust than their CNN counterparts under both white-box and black-box attacks. To further explain the robustness of SF-CNNs, we compare the SF layer with a trainable convolutional layer with identical kernel sizes using two mixing strategies to show that the lower frequency components contribute the most to the adversarial robustness of SF-CNNs. We believe our observations can guide the future design of robust CNN models.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
TransAnaNet: Transformer-based Anatomy Change Prediction Network for Head and Neck Cancer Patient Radiotherapy
Authors:
Meixu Chen,
Kai Wang,
Michael Dohopolski,
Howard Morgan,
David Sher,
**g Wang
Abstract:
Early identification of head and neck cancer (HNC) patients who would experience significant anatomical change during radiotherapy (RT) is important to optimize patient clinical benefit and treatment resources. This study aims to assess the feasibility of using a vision-transformer (ViT) based neural network to predict RT-induced anatomic change in HNC patients. We retrospectively included 121 HNC…
▽ More
Early identification of head and neck cancer (HNC) patients who would experience significant anatomical change during radiotherapy (RT) is important to optimize patient clinical benefit and treatment resources. This study aims to assess the feasibility of using a vision-transformer (ViT) based neural network to predict RT-induced anatomic change in HNC patients. We retrospectively included 121 HNC patients treated with definitive RT/CRT. We collected the planning CT (pCT), planned dose, CBCTs acquired at the initial treatment (CBCT01) and fraction 21 (CBCT21), and primary tumor volume (GTVp) and involved nodal volume (GTVn) delineated on both pCT and CBCTs for model construction and evaluation. A UNet-style ViT network was designed to learn spatial correspondence and contextual information from embedded CT, dose, CBCT01, GTVp, and GTVn image patches. The model estimated the deformation vector field between CBCT01 and CBCT21 as the prediction of anatomic change, and deformed CBCT01 was used as the prediction of CBCT21. We also generated binary masks of GTVp, GTVn, and patient body for volumetric change evaluation. The predicted image from the proposed method yielded the best similarity to the real image (CBCT21) over pCT, CBCT01, and predicted CBCTs from other comparison models. The average MSE and SSIM between the normalized predicted CBCT to CBCT21 are 0.009 and 0.933, while the average dice coefficient between body mask, GTVp mask, and GTVn mask are 0.972, 0.792, and 0.821 respectively. The proposed method showed promising performance for predicting radiotherapy-induced anatomic change, which has the potential to assist in the decision-making of HNC Adaptive RT.
△ Less
Submitted 22 May, 2024; v1 submitted 9 May, 2024;
originally announced May 2024.
-
Advancing Head and Neck Cancer Survival Prediction via Multi-Label Learning and Deep Model Interpretation
Authors:
Meixu Chen,
Kai Wang,
**g Wang
Abstract:
A comprehensive and reliable survival prediction model is of great importance to assist in the personalized management of Head and Neck Cancer (HNC) patients treated with curative Radiation Therapy (RT). In this work, we propose IMLSP, an Interpretable Multi-Label multi-modal deep Survival Prediction framework for predicting multiple HNC survival outcomes simultaneously and provide time-event spec…
▽ More
A comprehensive and reliable survival prediction model is of great importance to assist in the personalized management of Head and Neck Cancer (HNC) patients treated with curative Radiation Therapy (RT). In this work, we propose IMLSP, an Interpretable Multi-Label multi-modal deep Survival Prediction framework for predicting multiple HNC survival outcomes simultaneously and provide time-event specific visual explanation of the deep prediction process. We adopt Multi-Task Logistic Regression (MTLR) layers to convert survival prediction from a regression problem to a multi-time point classification task, and to enable predicting of multiple relevant survival outcomes at the same time. We also present Grad-TEAM, a Gradient-weighted Time-Event Activation Map** approach specifically developed for deep survival model visual explanation, to generate patient-specific time-to-event activation maps. We evaluate our method with the publicly available RADCURE HNC dataset, where it outperforms the corresponding single-modal models and single-label models on all survival outcomes. The generated activation maps show that the model focuses primarily on the tumor and nodal volumes when making the decision and the volume of interest varies for high- and low-risk patients. We demonstrate that the multi-label learning strategy can improve the learning efficiency and prognostic performance, while the interpretable survival prediction model is promising to help understand the decision-making process of AI and facilitate personalized treatment.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
Blockchains for Internet of Things: Fundamentals, Applications, and Challenges
Authors:
Yusen Wu,
Ye Hu,
Mingzhe Chen,
Yelena Yesha,
Mérouane Debbah
Abstract:
Internet of Things (IoT) services necessitate the storage, transmission, and analysis of diverse data for inference, autonomy, and control. Blockchains, with their inherent properties of decentralization and security, offer efficient database solutions for these devices through consensus-based data sharing. However, it's essential to recognize that not every blockchain system is suitable for speci…
▽ More
Internet of Things (IoT) services necessitate the storage, transmission, and analysis of diverse data for inference, autonomy, and control. Blockchains, with their inherent properties of decentralization and security, offer efficient database solutions for these devices through consensus-based data sharing. However, it's essential to recognize that not every blockchain system is suitable for specific IoT applications, and some might be more beneficial when excluded with privacy concerns. For example, public blockchains are not suitable for storing sensitive data. This paper presents a detailed review of three distinct blockchains tailored for enhancing IoT applications. We initially delve into the foundational aspects of three blockchain systems, highlighting their strengths, limitations, and implementation needs. Additionally, we discuss the security issues in different blockchains. Subsequently, we explore the blockchain's application in three pivotal IoT areas: edge AI, communications, and healthcare. We underscore potential challenges and the future directions for integrating different blockchains in IoT. Ultimately, this paper aims to offer a comprehensive perspective on the synergies between blockchains and the IoT ecosystem, highlighting the opportunities and complexities involved.
△ Less
Submitted 14 June, 2024; v1 submitted 8 May, 2024;
originally announced May 2024.
-
When Foresight Pruning Meets Zeroth-Order Optimization: Efficient Federated Learning for Low-Memory Devices
Authors:
Pengyu Zhang,
Yingjie Liu,
Yingbo Zhou,
Xiao Du,
Xian Wei,
Ting Wang,
Mingsong Chen
Abstract:
Although Federated Learning (FL) enables collaborative learning in Artificial Intelligence of Things (AIoT) design, it fails to work on low-memory AIoT devices due to its heavy memory usage. To address this problem, various federated pruning methods are proposed to reduce memory usage during inference. However, few of them can substantially mitigate the memory burdens during pruning and training.…
▽ More
Although Federated Learning (FL) enables collaborative learning in Artificial Intelligence of Things (AIoT) design, it fails to work on low-memory AIoT devices due to its heavy memory usage. To address this problem, various federated pruning methods are proposed to reduce memory usage during inference. However, few of them can substantially mitigate the memory burdens during pruning and training. As an alternative, zeroth-order or backpropagation-free (BP-Free) methods can partially alleviate the memory consumption, but they suffer from scaling up and large computation overheads, since the gradient estimation error and floating point operations (FLOPs) increase as the dimensionality of the model parameters grows. In this paper, we propose a federated foresight pruning method based on Neural Tangent Kernel (NTK), which can seamlessly integrate with federated BP-Free training frameworks. We present an approximation to the computation of federated NTK by using the local NTK matrices. Moreover, we demonstrate that the data-free property of our method can substantially reduce the approximation error in extreme data heterogeneity scenarios. Since our approach improves the performance of the vanilla BP-Free method with fewer FLOPs and truly alleviates memory pressure during training and inference, it makes FL more friendly to low-memory devices. Comprehensive experimental results obtained from simulation- and real test-bed-based platforms show that our federated foresight-pruning method not only preserves the ability of the dense model with a memory reduction up to 9x but also boosts the performance of the vanilla BP-Free method with dramatically fewer FLOPs.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
Enabling Privacy-Preserving and Publicly Auditable Federated Learning
Authors:
Huang Zeng,
Anjia Yang,
Jian Weng,
Min-Rong Chen,
Fengjun Xiao,
Yi Liu,
Ye Yao
Abstract:
Federated learning (FL) has attracted widespread attention because it supports the joint training of models by multiple participants without moving private dataset. However, there are still many security issues in FL that deserve discussion. In this paper, we consider three major issues: 1) how to ensure that the training process can be publicly audited by any third party; 2) how to avoid the infl…
▽ More
Federated learning (FL) has attracted widespread attention because it supports the joint training of models by multiple participants without moving private dataset. However, there are still many security issues in FL that deserve discussion. In this paper, we consider three major issues: 1) how to ensure that the training process can be publicly audited by any third party; 2) how to avoid the influence of malicious participants on training; 3) how to ensure that private gradients and models are not leaked to third parties. Many solutions have been proposed to address these issues, while solving the above three problems simultaneously is seldom considered. In this paper, we propose a publicly auditable and privacy-preserving federated learning scheme that is resistant to malicious participants uploading gradients with wrong directions and enables anyone to audit and verify the correctness of the training process. In particular, we design a robust aggregation algorithm capable of detecting gradients with wrong directions from malicious participants. Then, we design a random vector generation algorithm and combine it with zero sharing and blockchain technologies to make the joint training process publicly auditable, meaning anyone can verify the correctness of the training. Finally, we conduct a series of experiments, and the experimental results show that the model generated by the protocol is comparable in accuracy to the original FL approach while kee** security advantages.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
The weighted and shifted seven-step BDF method for parabolic equations
Authors:
Georgios Akrivis,
Minghua Chen,
Fan Yu
Abstract:
Stability of the BDF methods of order up to five for parabolic equations can be established by the energy technique via Nevanlinna--Odeh multipliers. The nonexistence of Nevanlinna--Odeh multipliers makes the six-step BDF method special; however, the energy technique was recently extended by the authors in [Akrivis et al., SIAM J. Numer. Anal. \textbf{59} (2021) 2449--2472] and covers all six stab…
▽ More
Stability of the BDF methods of order up to five for parabolic equations can be established by the energy technique via Nevanlinna--Odeh multipliers. The nonexistence of Nevanlinna--Odeh multipliers makes the six-step BDF method special; however, the energy technique was recently extended by the authors in [Akrivis et al., SIAM J. Numer. Anal. \textbf{59} (2021) 2449--2472] and covers all six stable BDF methods. The seven-step BDF method is unstable for parabolic equations, since it is not even zero-stable. In this work, we construct and analyze a stable linear combination of two non zero-stable schemes, the seven-step BDF method and its shifted counterpart, referred to as WSBDF7 method. The stability regions of the WSBDF$q, q\leqslant 7$, with a weight $\vartheta\geqslant1$, increase as $\vartheta$ increases, are larger than the stability regions of the classical BDF$q,$ corresponding to $\vartheta=1$. We determine novel and suitable multipliers for the WSBDF7 method and establish stability for parabolic equations by the energy technique. The proposed approach is applicable for mean curvature flow, gradient flows, fractional equations and nonlinear equations.
△ Less
Submitted 5 May, 2024;
originally announced May 2024.
-
Position: Quo Vadis, Unsupervised Time Series Anomaly Detection?
Authors:
M. Saquib Sarfraz,
Mei-Yen Chen,
Lukas Layer,
Kunyu Peng,
Marios Koulakis
Abstract:
The current state of machine learning scholarship in Timeseries Anomaly Detection (TAD) is plagued by the persistent use of flawed evaluation metrics, inconsistent benchmarking practices, and a lack of proper justification for the choices made in novel deep learning-based model designs. Our paper presents a critical analysis of the status quo in TAD, revealing the misleading track of current resea…
▽ More
The current state of machine learning scholarship in Timeseries Anomaly Detection (TAD) is plagued by the persistent use of flawed evaluation metrics, inconsistent benchmarking practices, and a lack of proper justification for the choices made in novel deep learning-based model designs. Our paper presents a critical analysis of the status quo in TAD, revealing the misleading track of current research and highlighting problematic methods, and evaluation practices. Our position advocates for a shift in focus from solely pursuing novel model designs to improving benchmarking practices, creating non-trivial datasets, and critically evaluating the utility of complex methods against simpler baselines. Our findings demonstrate the need for rigorous evaluation protocols, the creation of simple baselines, and the revelation that state-of-the-art deep anomaly detection models effectively learn linear map**s. These findings suggest the need for more exploration and development of simple and interpretable TAD methods. The increment of model complexity in the state-of-the-art deep-learning based models unfortunately offers very little improvement. We offer insights and suggestions for the field to move forward.
Code: https://github.com/ssarfraz/QuoVadisTAD
△ Less
Submitted 5 June, 2024; v1 submitted 4 May, 2024;
originally announced May 2024.
-
Towards General Neural Surrogate Solvers with Specialized Neural Accelerators
Authors:
Chenkai Mao,
Robert Lupoiu,
Tianxiang Dai,
Mingkun Chen,
Jonathan A. Fan
Abstract:
Surrogate neural network-based partial differential equation (PDE) solvers have the potential to solve PDEs in an accelerated manner, but they are largely limited to systems featuring fixed domain sizes, geometric layouts, and boundary conditions. We propose Specialized Neural Accelerator-Powered Domain Decomposition Methods (SNAP-DDM), a DDM-based approach to PDE solving in which subdomain proble…
▽ More
Surrogate neural network-based partial differential equation (PDE) solvers have the potential to solve PDEs in an accelerated manner, but they are largely limited to systems featuring fixed domain sizes, geometric layouts, and boundary conditions. We propose Specialized Neural Accelerator-Powered Domain Decomposition Methods (SNAP-DDM), a DDM-based approach to PDE solving in which subdomain problems containing arbitrary boundary conditions and geometric parameters are accurately solved using an ensemble of specialized neural operators. We tailor SNAP-DDM to 2D electromagnetics and fluidic flow problems and show how innovations in network architecture and loss function engineering can produce specialized surrogate subdomain solvers with near unity accuracy. We utilize these solvers with standard DDM algorithms to accurately solve freeform electromagnetics and fluids problems featuring a wide range of domain sizes.
△ Less
Submitted 14 June, 2024; v1 submitted 2 May, 2024;
originally announced May 2024.
-
Model-based Deep Learning for Rate Split Multiple Access in Vehicular Communications
Authors:
Hanwen Zhang,
Mingzhe Chen,
Alireza Vahid,
Haijian Sun
Abstract:
Rate split multiple access (RSMA) has been proven as an effective communication scheme for 5G and beyond, especially in vehicular scenarios. However, RSMA requires complicated iterative algorithms for proper resource allocation, which cannot fulfill the stringent latency requirement in resource constrained vehicles. Although data driven approaches can alleviate this issue, they suffer from poor ge…
▽ More
Rate split multiple access (RSMA) has been proven as an effective communication scheme for 5G and beyond, especially in vehicular scenarios. However, RSMA requires complicated iterative algorithms for proper resource allocation, which cannot fulfill the stringent latency requirement in resource constrained vehicles. Although data driven approaches can alleviate this issue, they suffer from poor generalizability and scarce training data. In this paper, we propose a fractional programming (FP) based deep unfolding (DU) approach to address resource allocation problem for a weighted sum rate optimization in RSMA. By carefully designing the penalty function, we couple the variable update with projected gradient descent algorithm (PGD). Following the structure of PGD, we embed few learnable parameters in each layer of the DU network. Through extensive simulation, we have shown that the proposed model-based neural networks has similar performance as optimal results given by traditional algorithm but with much lower computational complexity, less training data, and higher resilience to test set data and out-of-distribution (OOD) data.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
Reverse Influential Community Search Over Social Networks (Technical Report)
Authors:
Qi Wen,
Nan Zhang,
Yutong Ye,
Xiang Lian,
Mingsong Chen
Abstract:
As an important fundamental task of numerous real-world applications such as social network analysis and online advertising/marketing, several prior works studied influential community search, which retrieves a community with high structural cohesiveness and maximum influences on other users in social networks. However, previous works usually considered the influences of the community on arbitrary…
▽ More
As an important fundamental task of numerous real-world applications such as social network analysis and online advertising/marketing, several prior works studied influential community search, which retrieves a community with high structural cohesiveness and maximum influences on other users in social networks. However, previous works usually considered the influences of the community on arbitrary users in social networks, rather than specific groups (e.g., customer groups, or senior communities). Inspired by this, we propose a novel Reverse Influential Community Search (RICS) problem, which obtains a seed community with the maximum influence on a user-specified target community, satisfying both structural and keyword constraints. To efficiently tackle the RICS problem, we design effective pruning strategies to filter out false alarms of candidate seed communities, and propose an effective index mechanism to facilitate the community retrieval. We also formulate and tackle an RICS variant, named Relaxed Reverse Influential Community Search (R2ICS), which returns a subgraph with the relaxed structural constraints and having the maximum influence on a user-specified target community. Comprehensive experiments have been conducted to verify the efficiency and effectiveness of our RICS and R2ICS approaches on both real-world and synthetic social networks under various parameter settings.
△ Less
Submitted 7 May, 2024; v1 submitted 2 May, 2024;
originally announced May 2024.
-
MiniGPT-3D: Efficiently Aligning 3D Point Clouds with Large Language Models using 2D Priors
Authors:
Yuan Tang,
Xu Han,
Xianzhi Li,
Qiao Yu,
Yixue Hao,
Long Hu,
Min Chen
Abstract:
Large 2D vision-language models (2D-LLMs) have gained significant attention by bridging Large Language Models (LLMs) with images using a simple projector. Inspired by their success, large 3D point cloud-language models (3D-LLMs) also integrate point clouds into LLMs. However, directly aligning point clouds with LLM requires expensive training costs, typically in hundreds of GPU-hours on A100, whic…
▽ More
Large 2D vision-language models (2D-LLMs) have gained significant attention by bridging Large Language Models (LLMs) with images using a simple projector. Inspired by their success, large 3D point cloud-language models (3D-LLMs) also integrate point clouds into LLMs. However, directly aligning point clouds with LLM requires expensive training costs, typically in hundreds of GPU-hours on A100, which hinders the development of 3D-LLMs. In this paper, we introduce MiniGPT-3D, an efficient and powerful 3D-LLM that achieves multiple SOTA results while training for only 27 hours on one RTX 3090. Specifically, we propose to align 3D point clouds with LLMs using 2D priors from 2D-LLMs, which can leverage the similarity between 2D and 3D visual information. We introduce a novel four-stage training strategy for modality alignment in a cascaded way, and a mixture of query experts module to adaptively aggregate features with high efficiency. Moreover, we utilize parameter-efficient fine-tuning methods LoRA and Norm fine-tuning, resulting in only 47.8M learnable parameters, which is up to 260x fewer than existing methods. Extensive experiments show that MiniGPT-3D achieves SOTA on 3D object classification and captioning tasks, with significantly cheaper training costs. Notably, MiniGPT-3D gains an 8.12 increase on GPT-4 evaluation score for the challenging object captioning task compared to ShapeLLM-13B, while the latter costs 160 total GPU-hours on 8 A800. We are the first to explore the efficient 3D-LLM, offering new insights to the community. Code and weights are available at https://github.com/TangYuan96/MiniGPT-3D.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
Koopman-based Deep Learning for Nonlinear System Estimation
Authors:
Zexin Sun,
Mingyu Chen,
John Baillieul
Abstract:
Nonlinear differential equations are encountered as models of fluid flow, spiking neurons, and many other systems of interest in the real world. Common features of these systems are that their behaviors are difficult to describe exactly and invariably unmodeled dynamics present challenges in making precise predictions. In many cases the models exhibit extremely complicated behavior due to bifurcat…
▽ More
Nonlinear differential equations are encountered as models of fluid flow, spiking neurons, and many other systems of interest in the real world. Common features of these systems are that their behaviors are difficult to describe exactly and invariably unmodeled dynamics present challenges in making precise predictions. In many cases the models exhibit extremely complicated behavior due to bifurcations and chaotic regimes. In this paper, we present a novel data-driven linear estimator that uses Koopman operator theory to extract finite-dimensional representations of complex nonlinear systems. The extracted model is used together with a deep reinforcement learning network that learns the optimal stepwise actions to predict future states of the original nonlinear system. Our estimator is also adaptive to a diffeomorphic transformation of the nonlinear system which enables transfer learning to compute state estimates of the transformed system without relearning from scratch.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
Causal Evaluation of Language Models
Authors:
Sirui Chen,
Bo Peng,
Meiqi Chen,
Ruiqi Wang,
Mengying Xu,
Xingyu Zeng,
Rui Zhao,
Shengjie Zhao,
Yu Qiao,
Chaochao Lu
Abstract:
Causal reasoning is viewed as crucial for achieving human-level machine intelligence. Recent advances in language models have expanded the horizons of artificial intelligence across various domains, sparking inquiries into their potential for causal reasoning. In this work, we introduce Causal evaluation of Language Models (CaLM), which, to the best of our knowledge, is the first comprehensive ben…
▽ More
Causal reasoning is viewed as crucial for achieving human-level machine intelligence. Recent advances in language models have expanded the horizons of artificial intelligence across various domains, sparking inquiries into their potential for causal reasoning. In this work, we introduce Causal evaluation of Language Models (CaLM), which, to the best of our knowledge, is the first comprehensive benchmark for evaluating the causal reasoning capabilities of language models. First, we propose the CaLM framework, which establishes a foundational taxonomy consisting of four modules: causal target (i.e., what to evaluate), adaptation (i.e., how to obtain the results), metric (i.e., how to measure the results), and error (i.e., how to analyze the bad results). This taxonomy defines a broad evaluation design space while systematically selecting criteria and priorities. Second, we compose the CaLM dataset, comprising 126,334 data samples, to provide curated sets of causal targets, adaptations, metrics, and errors, offering extensive coverage for diverse research pursuits. Third, we conduct an extensive evaluation of 28 leading language models on a core set of 92 causal targets, 9 adaptations, 7 metrics, and 12 error types. Fourth, we perform detailed analyses of the evaluation results across various dimensions (e.g., adaptation, scale). Fifth, we present 50 high-level empirical findings across 9 dimensions (e.g., model), providing valuable guidance for future language model development. Finally, we develop a multifaceted platform, including a website, leaderboards, datasets, and toolkits, to support scalable and adaptable assessments. We envision CaLM as an ever-evolving benchmark for the community, systematically updated with new causal targets, adaptations, models, metrics, and error types to reflect ongoing research advancements. Project website is at https://opencausalab.github.io/CaLM.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
In Anticipation of Perfect Deepfake: Identity-anchored Artifact-agnostic Detection under Rebalanced Deepfake Detection Protocol
Authors:
Wei-Han Wang,
Chin-Yuan Yeh,
Hsi-Wen Chen,
De-Nian Yang,
Ming-Syan Chen
Abstract:
As deep generative models advance, we anticipate deepfakes achieving "perfection"-generating no discernible artifacts or noise. However, current deepfake detectors, intentionally or inadvertently, rely on such artifacts for detection, as they are exclusive to deepfakes and absent in genuine examples. To bridge this gap, we introduce the Rebalanced Deepfake Detection Protocol (RDDP) to stress-test…
▽ More
As deep generative models advance, we anticipate deepfakes achieving "perfection"-generating no discernible artifacts or noise. However, current deepfake detectors, intentionally or inadvertently, rely on such artifacts for detection, as they are exclusive to deepfakes and absent in genuine examples. To bridge this gap, we introduce the Rebalanced Deepfake Detection Protocol (RDDP) to stress-test detectors under balanced scenarios where genuine and forged examples bear similar artifacts. We offer two RDDP variants: RDDP-WHITEHAT uses white-hat deepfake algorithms to create 'self-deepfakes,' genuine portrait videos with the resemblance of the underlying identity, yet carry similar artifacts to deepfake videos; RDDP-SURROGATE employs surrogate functions (e.g., Gaussian noise) to process both genuine and forged examples, introducing equivalent noise, thereby sidestep** the need of deepfake algorithms.
Towards detecting perfect deepfake videos that aligns with genuine ones, we present ID-Miner, a detector that identifies the puppeteer behind the disguise by focusing on motion over artifacts or appearances. As an identity-based detector, it authenticates videos by comparing them with reference footage. Equipped with the artifact-agnostic loss at frame-level and the identity-anchored loss at video-level, ID-Miner effectively singles out identity signals amidst distracting variations. Extensive experiments comparing ID-Miner with 12 baseline detectors under both conventional and RDDP evaluations with two deepfake datasets, along with additional qualitative studies, affirm the superiority of our method and the necessity for detectors designed to counter perfect deepfakes.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
Pseudo Label Refinery for Unsupervised Domain Adaptation on Cross-dataset 3D Object Detection
Authors:
Zhanwei Zhang,
Minghao Chen,
Shuai Xiao,
Liang Peng,
Hengjia Li,
Binbin Lin,
** Li,
Wenxiao Wang,
Boxi Wu,
Deng Cai
Abstract:
Recent self-training techniques have shown notable improvements in unsupervised domain adaptation for 3D object detection (3D UDA). These techniques typically select pseudo labels, i.e., 3D boxes, to supervise models for the target domain. However, this selection process inevitably introduces unreliable 3D boxes, in which 3D points cannot be definitively assigned as foreground or background. Previ…
▽ More
Recent self-training techniques have shown notable improvements in unsupervised domain adaptation for 3D object detection (3D UDA). These techniques typically select pseudo labels, i.e., 3D boxes, to supervise models for the target domain. However, this selection process inevitably introduces unreliable 3D boxes, in which 3D points cannot be definitively assigned as foreground or background. Previous techniques mitigate this by reweighting these boxes as pseudo labels, but these boxes can still poison the training process. To resolve this problem, in this paper, we propose a novel pseudo label refinery framework. Specifically, in the selection process, to improve the reliability of pseudo boxes, we propose a complementary augmentation strategy. This strategy involves either removing all points within an unreliable box or replacing it with a high-confidence box. Moreover, the point numbers of instances in high-beam datasets are considerably higher than those in low-beam datasets, also degrading the quality of pseudo labels during the training process. We alleviate this issue by generating additional proposals and aligning RoI features across different domains. Experimental results demonstrate that our method effectively enhances the quality of pseudo labels and consistently surpasses the state-of-the-art methods on six autonomous driving benchmarks. Code will be available at https://github.com/Zhanwei-Z/PERE.
△ Less
Submitted 30 April, 2024;
originally announced April 2024.
-
G2LTraj: A Global-to-Local Generation Approach for Trajectory Prediction
Authors:
Zhanwei Zhang,
Zishuo Hua,
Minghao Chen,
Wei Lu,
Binbin Lin,
Deng Cai,
Wenxiao Wang
Abstract:
Predicting future trajectories of traffic agents accurately holds substantial importance in various applications such as autonomous driving. Previous methods commonly infer all future steps of an agent either recursively or simultaneously. However, the recursive strategy suffers from the accumulated error, while the simultaneous strategy overlooks the constraints among future steps, resulting in k…
▽ More
Predicting future trajectories of traffic agents accurately holds substantial importance in various applications such as autonomous driving. Previous methods commonly infer all future steps of an agent either recursively or simultaneously. However, the recursive strategy suffers from the accumulated error, while the simultaneous strategy overlooks the constraints among future steps, resulting in kinematically infeasible predictions. To address these issues, in this paper, we propose G2LTraj, a plug-and-play global-to-local generation approach for trajectory prediction. Specifically, we generate a series of global key steps that uniformly cover the entire future time range. Subsequently, the local intermediate steps between the adjacent key steps are recursively filled in. In this way, we prevent the accumulated error from propagating beyond the adjacent key steps. Moreover, to boost the kinematical feasibility, we not only introduce the spatial constraints among key steps but also strengthen the temporal constraints among the intermediate steps. Finally, to ensure the optimal granularity of key steps, we design a selectable granularity strategy that caters to each predicted trajectory. Our G2LTraj significantly improves the performance of seven existing trajectory predictors across the ETH, UCY and nuScenes datasets. Experimental results demonstrate its effectiveness. Code will be available at https://github.com/Zhanwei-Z/G2LTraj.
△ Less
Submitted 30 April, 2024;
originally announced April 2024.
-
Tunable Collective Excitations in Epitaxial Perovskite Nickelates
Authors:
Mengxia Sun,
Xu He,
Mingyao Chen,
Chi Sin Tang,
Xiongfang Liu,
Liang Dai,
Jishan Liu,
Zhigang Zeng,
Shuo Sun,
Mark B. H. Breese,
Chuanbing Cai,
Yingge Du,
Le Wang,
Andrew T. S. Wee,
Xinmao Yin
Abstract:
The formation of plasmons through the collective excitation of charge density has generated intense discussions, offering insights to fundamental sciences and potential applications. While the underlying physical principles have been well-established, the effects of many-body interactions and orbital hybridization on plasmonic dynamics remain understudied. In this work, we present the observation…
▽ More
The formation of plasmons through the collective excitation of charge density has generated intense discussions, offering insights to fundamental sciences and potential applications. While the underlying physical principles have been well-established, the effects of many-body interactions and orbital hybridization on plasmonic dynamics remain understudied. In this work, we present the observation of conventional metallic and correlated plasmons in epitaxial La1-xSrxNiO3 (LSNO) films with varying Sr do** concentrations (x = 0, 0.125, 0.25), unveiling their intriguing evolution. Unlike samples at other do** concentrations, the x = 0.125 intermediate do** sample does not exhibit the correlated plasmons despite showing high optical conductivity. Through a comprehensive experimental investigation using spectroscopic ellipsometry and X-ray absorption spectroscopy, the O2p-Ni3d orbital hybridization for LSNO with a do** concentration of x = 0.125 is found to be significantly enhanced, alongside a considerable weakening of its effective correlation U*. These factors account for the absence of correlated plasmons and the high optical conductivity observed in LSNO (0.125). Our results underscore the profound impact of orbital hybridization on the electronic structure and the formation of plasmon in strongly-correlated systems. This in turn suggest that LSNO could serve as a promising alternative material in optoelectronic devices.
△ Less
Submitted 1 June, 2024; v1 submitted 29 April, 2024;
originally announced April 2024.
-
DGE: Direct Gaussian 3D Editing by Consistent Multi-view Editing
Authors:
Minghao Chen,
Iro Laina,
Andrea Vedaldi
Abstract:
We consider the problem of editing 3D objects and scenes based on open-ended language instructions. The established paradigm to solve this problem is to use a 2D image generator or editor to guide the 3D editing process. However, this is often slow as it requires do update a computationally expensive 3D representations such as a neural radiance field, and to do so by using contradictory guidance f…
▽ More
We consider the problem of editing 3D objects and scenes based on open-ended language instructions. The established paradigm to solve this problem is to use a 2D image generator or editor to guide the 3D editing process. However, this is often slow as it requires do update a computationally expensive 3D representations such as a neural radiance field, and to do so by using contradictory guidance from a 2D model which is inherently not multi-view consistent. We thus introduce the Direct Gaussian Editor (DGE), a method that addresses these issues in two ways. First, we modify a given high-quality image editor like InstructPix2Pix to be multi-view consistent. We do so by utilizing a training-free approach which integrates cues from the underlying 3D geometry of the scene. Second, given a multi-view consistent edited sequence of images of the object, we directly and efficiently optimize the 3D object representation, which is based on 3D Gaussian Splatting. Because it does not require to apply edits incrementally and iteratively, DGE is significantly more efficient than existing approaches, and comes with other perks such as allowing selective editing of parts of the scene.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
Uncovering an Interfacial Band Resulting from Orbital Hybridization in Nickelate Heterostructures
Authors:
Mingyao Chen,
Huimin Liu,
Xu He,
Minjuan Li,
Chi Sin Tang,
Mengxia Sun,
Krishna Prasad Koirala,
Mark E. Bowden,
Yangyang Li,
Xiongfang Liu,
Difan Zhou,
Shuo Sun,
Mark B. H. Breese,
Chuanbing Cai,
Yingge Du,
Andrew T. S. Wee,
Le Wang,
Xinmao Yin
Abstract:
The interaction of atomic orbitals at the interface of perovskite oxide heterostructures has been investigated for its profound impact on the band structures and electronic properties, giving rise to unique electronic states and a variety of tunable functionalities. In this study, we conducted an extensive investigation of the optical and electronic properties of epitaxial NdNiO3 thin films grown…
▽ More
The interaction of atomic orbitals at the interface of perovskite oxide heterostructures has been investigated for its profound impact on the band structures and electronic properties, giving rise to unique electronic states and a variety of tunable functionalities. In this study, we conducted an extensive investigation of the optical and electronic properties of epitaxial NdNiO3 thin films grown on a series of single crystal substrates. Unlike films synthesized on other substrates, NdNiO3 on SrTiO3 (NNO/STO) gives rise to a unique band structure which features an additional unoccupied band situated above the Fermi level. Our comprehensive investigation, which incorporated a wide array of experimental techniques and density functional theory calculations, revealed that the emergence of the interfacial band structure is primarily driven by the orbital hybridization between Ti 3d orbitals of the STO substrate and O 2p orbitals of the NNO thin film. Furthermore, exciton peaks have been detected in the optical spectra of the NNO/STO film, attributable to the pronounced electron-electron (e-e) and electron-hole (e-h) interactions propagating from the STO substrate into the NNO film. These findings underscore the substantial influence of interfacial orbital hybridization on the electronic structure of oxide thin-films, thereby offering key insights into tuning their interfacial properties.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
Tunnel Try-on: Excavating Spatial-temporal Tunnels for High-quality Virtual Try-on in Videos
Authors:
Zhengze Xu,
Mengting Chen,
Zhao Wang,
Linyu Xing,
Zhonghua Zhai,
Nong Sang,
**song Lan,
Shuai Xiao,
Changxin Gao
Abstract:
Video try-on is a challenging task and has not been well tackled in previous works. The main obstacle lies in preserving the details of the clothing and modeling the coherent motions simultaneously. Faced with those difficulties, we address video try-on by proposing a diffusion-based framework named "Tunnel Try-on." The core idea is excavating a "focus tunnel" in the input video that gives close-u…
▽ More
Video try-on is a challenging task and has not been well tackled in previous works. The main obstacle lies in preserving the details of the clothing and modeling the coherent motions simultaneously. Faced with those difficulties, we address video try-on by proposing a diffusion-based framework named "Tunnel Try-on." The core idea is excavating a "focus tunnel" in the input video that gives close-up shots around the clothing regions. We zoom in on the region in the tunnel to better preserve the fine details of the clothing. To generate coherent motions, we first leverage the Kalman filter to construct smooth crops in the focus tunnel and inject the position embedding of the tunnel into attention layers to improve the continuity of the generated videos. In addition, we develop an environment encoder to extract the context information outside the tunnels as supplementary cues. Equipped with these techniques, Tunnel Try-on keeps the fine details of the clothing and synthesizes stable and smooth videos. Demonstrating significant advancements, Tunnel Try-on could be regarded as the first attempt toward the commercial-level application of virtual try-on in videos.
△ Less
Submitted 26 April, 2024;
originally announced April 2024.
-
Automatic Speech Recognition System-Independent Word Error Rate Estimation
Authors:
Chanho Park,
Mingjie Chen,
Thomas Hain
Abstract:
Word error rate (WER) is a metric used to evaluate the quality of transcriptions produced by Automatic Speech Recognition (ASR) systems. In many applications, it is of interest to estimate WER given a pair of a speech utterance and a transcript. Previous work on WER estimation focused on building models that are trained with a specific ASR system in mind (referred to as ASR system-dependent). Thes…
▽ More
Word error rate (WER) is a metric used to evaluate the quality of transcriptions produced by Automatic Speech Recognition (ASR) systems. In many applications, it is of interest to estimate WER given a pair of a speech utterance and a transcript. Previous work on WER estimation focused on building models that are trained with a specific ASR system in mind (referred to as ASR system-dependent). These are also domain-dependent and inflexible in real-world applications. In this paper, a hypothesis generation method for ASR System-Independent WER estimation (SIWE) is proposed. In contrast to prior work, the WER estimators are trained using data that simulates ASR system output. Hypotheses are generated using phonetically similar or linguistically more likely alternative words. In WER estimation experiments, the proposed method reaches a similar performance to ASR system-dependent WER estimators on in-domain data and achieves state-of-the-art performance on out-of-domain data. On the out-of-domain data, the SIWE model outperformed the baseline estimators in root mean square error and Pearson correlation coefficient by relative 17.58% and 18.21%, respectively, on Switchboard and CALLHOME. The performance was further improved when the WER of the training set was close to the WER of the evaluation dataset.
△ Less
Submitted 26 April, 2024; v1 submitted 25 April, 2024;
originally announced April 2024.
-
Multi-scale HSV Color Feature Embedding for High-fidelity NIR-to-RGB Spectrum Translation
Authors:
Huiyu Zhai,
Mo Chen,
Xingxing Yang,
Gusheng Kang
Abstract:
The NIR-to-RGB spectral domain translation is a formidable task due to the inherent spectral map** ambiguities within NIR inputs and RGB outputs. Thus, existing methods fail to reconcile the tension between maintaining texture detail fidelity and achieving diverse color variations. In this paper, we propose a Multi-scale HSV Color Feature Embedding Network (MCFNet) that decomposes the map** pr…
▽ More
The NIR-to-RGB spectral domain translation is a formidable task due to the inherent spectral map** ambiguities within NIR inputs and RGB outputs. Thus, existing methods fail to reconcile the tension between maintaining texture detail fidelity and achieving diverse color variations. In this paper, we propose a Multi-scale HSV Color Feature Embedding Network (MCFNet) that decomposes the map** process into three sub-tasks, including NIR texture maintenance, coarse geometry reconstruction, and RGB color prediction. Thus, we propose three key modules for each corresponding sub-task: the Texture Preserving Block (TPB), the HSV Color Feature Embedding Module (HSV-CFEM), and the Geometry Reconstruction Module (GRM). These modules contribute to our MCFNet methodically tackling spectral translation through a series of escalating resolutions, progressively enriching images with color and texture fidelity in a scale-coherent fashion. The proposed MCFNet demonstrates substantial performance gains over the NIR image colorization task. Code is released at: https://github.com/AlexYangxx/MCFNet.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
A Comprehensive Survey on Evaluating Large Language Model Applications in the Medical Industry
Authors:
Yining Huang,
Keke Tang,
Meilian Chen,
Boyuan Wang
Abstract:
Since the inception of the Transformer architecture in 2017, Large Language Models (LLMs) such as GPT and BERT have evolved significantly, impacting various industries with their advanced capabilities in language understanding and generation. These models have shown potential to transform the medical field, highlighting the necessity for specialized evaluation frameworks to ensure their effective…
▽ More
Since the inception of the Transformer architecture in 2017, Large Language Models (LLMs) such as GPT and BERT have evolved significantly, impacting various industries with their advanced capabilities in language understanding and generation. These models have shown potential to transform the medical field, highlighting the necessity for specialized evaluation frameworks to ensure their effective and ethical deployment. This comprehensive survey delineates the extensive application and requisite evaluation of LLMs within healthcare, emphasizing the critical need for empirical validation to fully exploit their capabilities in enhancing healthcare outcomes. Our survey is structured to provide an in-depth analysis of LLM applications across clinical settings, medical text data processing, research, education, and public health awareness. We begin by exploring the roles of LLMs in various medical applications, detailing their evaluation based on performance in tasks such as clinical diagnosis, medical text data processing, information retrieval, data analysis, and educational content generation. The subsequent sections offer a comprehensive discussion on the evaluation methods and metrics employed, including models, evaluators, and comparative experiments. We further examine the benchmarks and datasets utilized in these evaluations, providing a categorized description of benchmarks for tasks like question answering, summarization, information extraction, bioinformatics, information retrieval and general comprehensive benchmarks. This structure ensures a thorough understanding of how LLMs are assessed for their effectiveness, accuracy, usability, and ethical alignment in the medical domain. ...
△ Less
Submitted 29 May, 2024; v1 submitted 24 April, 2024;
originally announced April 2024.
-
Development of Pattern Recognition Validation for Boson Sampling
Authors:
Yang Ji,
Yongzheng Wu,
Shi Wang,
Jie Hou,
Meiling Chen,
Ming Ni
Abstract:
Boson sampling is one of the most attractive quantum computation models to demonstrate the quantum computational advantage. However, this aim may be hard to realize considering noise sources such as photon distinguishability. Inspired by the Bayesian validation developed to evaluate whether photon distinguishability is too high to demonstrate the quantum computational advantage, we develop the pat…
▽ More
Boson sampling is one of the most attractive quantum computation models to demonstrate the quantum computational advantage. However, this aim may be hard to realize considering noise sources such as photon distinguishability. Inspired by the Bayesian validation developed to evaluate whether photon distinguishability is too high to demonstrate the quantum computational advantage, we develop the pattern recognition validation for boson sampling. Based on clusters constructed with the K means++ method, the distribution of test values is nearly monotonically changed with the photon indistinguishability, especially when photons are close to be indistinguishable. We analyze the intrinsic data structure through calculating probability distributions and mean 2-norm distances of the sorted outputs. Approximation algorithms are also used to show the data structure changes with photon distinguishability.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
DENOISER: Rethinking the Robustness for Open-Vocabulary Action Recognition
Authors:
Haozhe Cheng,
Cheng Ju,
Haicheng Wang,
**xiang Liu,
Mengting Chen,
Qiang Hu,
Xiaoyun Zhang,
Yanfeng Wang
Abstract:
As one of the fundamental video tasks in computer vision, Open-Vocabulary Action Recognition (OVAR) recently gains increasing attention, with the development of vision-language pre-trainings. To enable generalization of arbitrary classes, existing methods treat class labels as text descriptions, then formulate OVAR as evaluating embedding similarity between visual samples and textual classes. Howe…
▽ More
As one of the fundamental video tasks in computer vision, Open-Vocabulary Action Recognition (OVAR) recently gains increasing attention, with the development of vision-language pre-trainings. To enable generalization of arbitrary classes, existing methods treat class labels as text descriptions, then formulate OVAR as evaluating embedding similarity between visual samples and textual classes. However, one crucial issue is completely ignored: the class descriptions given by users may be noisy, e.g., misspellings and typos, limiting the real-world practicality of vanilla OVAR. To fill the research gap, this paper pioneers to evaluate existing methods by simulating multi-level noises of various types, and reveals their poor robustness. To tackle the noisy OVAR task, we further propose one novel DENOISER framework, covering two parts: generation and discrimination. Concretely, the generative part denoises noisy class-text names via one decoding process, i.e., propose text candidates, then utilize inter-modal and intra-modal information to vote for the best. At the discriminative part, we use vanilla OVAR models to assign visual samples to class-text names, thus obtaining more semantics. For optimization, we alternately iterate between generative and discriminative parts for progressive refinements. The denoised text classes help OVAR models classify visual samples more accurately; in return, classified visual samples help better denoising. On three datasets, we carry out extensive experiments to show our superior robustness, and thorough ablations to dissect the effectiveness of each component.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Gradient Guidance for Diffusion Models: An Optimization Perspective
Authors:
Yingqing Guo,
Hui Yuan,
Yukang Yang,
Minshuo Chen,
Mengdi Wang
Abstract:
Diffusion models have demonstrated empirical successes in various applications and can be adapted to task-specific needs via guidance. This paper introduces a form of gradient guidance for adapting or fine-tuning diffusion models towards user-specified optimization objectives. We study the theoretic aspects of a guided score-based sampling process, linking the gradient-guided diffusion model to fi…
▽ More
Diffusion models have demonstrated empirical successes in various applications and can be adapted to task-specific needs via guidance. This paper introduces a form of gradient guidance for adapting or fine-tuning diffusion models towards user-specified optimization objectives. We study the theoretic aspects of a guided score-based sampling process, linking the gradient-guided diffusion model to first-order optimization. We show that adding gradient guidance to the sampling process of a pre-trained diffusion model is essentially equivalent to solving a regularized optimization problem, where the regularization term acts as a prior determined by the pre-training data. Diffusion models are able to learn data's latent subspace, however, explicitly adding the gradient of an external objective function to the sample process would jeopardize the structure in generated samples. To remedy this issue, we consider a modified form of gradient guidance based on a forward prediction loss, which leverages the pre-trained score function to preserve the latent structure in generated samples. We further consider an iteratively fine-tuned version of gradient-guided diffusion where one can query gradients at newly generated data points and update the score network using new samples. This process mimics a first-order optimization iteration in expectation, for which we proved O(1/K) convergence rate to the global optimum when the objective function is concave.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Scalable cyclic transformation of orbital angular momentum modes based on a nonreciprocal Mach-Zehnder interferometer
Authors:
Y. F. Yang,
M. Y. Chen,
F. P. Li,
Y. P. Ruan,
Z. X. Li,
M. Xiao,
H. Zhang,
K. Y. Xia
Abstract:
The orbital angular momentum (OAM) of photons provides a pivotal resource for carrying out high-dimensional classical and quantum information processing due to its unique discrete high-dimensional nature. The cyclic transformation of a set of orthogonal OAM modes is an essential building block for universal high-dimensional information processing. Its realization in the quantum domain is the unive…
▽ More
The orbital angular momentum (OAM) of photons provides a pivotal resource for carrying out high-dimensional classical and quantum information processing due to its unique discrete high-dimensional nature. The cyclic transformation of a set of orthogonal OAM modes is an essential building block for universal high-dimensional information processing. Its realization in the quantum domain is the universal quantum Pauli-X gate. In this work, we experimentally demonstrate a cyclic transformation of six OAM modes with an averaged efficiency higher than 96% by exploiting a nonreciprocal Mach-Zehnder interferometer. Our system is simple and can, in principle, be scaled to more modes. By improving phase stabilization and inputting quantum photonic states, this method can perform universal single-photon quantum Pauli-X gate, thus paving the way for scalable high-dimensional quantum computation.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
High-Dimensional Two-Photon Quantum Controlled Phase-Flip Gate
Authors:
Mingyuan Chen,
Jiangshan Tang,
Miao Cai,
Franco Nori,
Keyu Xia
Abstract:
High-dimensional quantum systems have been used to reveal interesting fundamental physics and to improve information capacity and noise resilience in quantum information processing. However, it remains a significant challenge to realize universal two-photon quantum gates in high dimensions with high success probability. Here, by considering an ion-cavity QED system, we theoretically propose, to th…
▽ More
High-dimensional quantum systems have been used to reveal interesting fundamental physics and to improve information capacity and noise resilience in quantum information processing. However, it remains a significant challenge to realize universal two-photon quantum gates in high dimensions with high success probability. Here, by considering an ion-cavity QED system, we theoretically propose, to the best of our knowledge, the first high-dimensional, deterministic and universal two-photon quantum gate. By using an optical cavity embedded with a single trapped 40Ca+ ion, we achieve a high average fidelity larger than 98% for a quantum controlled phase-flip gate in four-dimensional space, spanned by photonic spin angular momenta and orbital angular momenta. Our proposed system can be an essential building block for high-dimensional quantum information processing, and also provides a platform for studying high-dimensional cavity QED.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Map** Wireless Networks into Digital Reality through Joint Vertical and Horizontal Learning
Authors:
Zifan Zhang,
Mingzhe Chen,
Zhaohui Yang,
Yuchen Liu
Abstract:
In recent years, the complexity of 5G and beyond wireless networks has escalated, prompting a need for innovative frameworks to facilitate flexible management and efficient deployment. The concept of digital twins (DTs) has emerged as a solution to enable real-time monitoring, predictive configurations, and decision-making processes. While existing works primarily focus on leveraging DTs to optimi…
▽ More
In recent years, the complexity of 5G and beyond wireless networks has escalated, prompting a need for innovative frameworks to facilitate flexible management and efficient deployment. The concept of digital twins (DTs) has emerged as a solution to enable real-time monitoring, predictive configurations, and decision-making processes. While existing works primarily focus on leveraging DTs to optimize wireless networks, a detailed map** methodology for creating virtual representations of network infrastructure and properties is still lacking. In this context, we introduce VH-Twin, a novel time-series data-driven framework that effectively maps wireless networks into digital reality. VH-Twin distinguishes itself through complementary vertical twinning (V-twinning) and horizontal twinning (H-twinning) stages, followed by a periodic clustering mechanism used to virtualize network regions based on their distinct geological and wireless characteristics. Specifically, V-twinning exploits distributed learning techniques to initialize a global twin model collaboratively from virtualized network clusters. H-twinning, on the other hand, is implemented with an asynchronous map** scheme that dynamically updates twin models in response to network or environmental changes. Leveraging real-world wireless traffic data within a cellular wireless network, comprehensive experiments are conducted to verify that VH-Twin can effectively construct, deploy, and maintain network DTs. Parametric analysis also offers insights into how to strike a balance between twinning efficiency and model accuracy at scale.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Autoencoder-assisted Feature Ensemble Net for Incipient Faults
Authors:
Mingxuan Gao,
Min Wang,
Maoyin Chen
Abstract:
Deep learning has shown the great power in the field of fault detection. However, for incipient faults with tiny amplitude, the detection performance of the current deep learning networks (DLNs) is not satisfactory. Even if prior information about the faults is utilized, DLNs can't successfully detect faults 3, 9 and 15 in Tennessee Eastman process (TEP). These faults are notoriously difficult to…
▽ More
Deep learning has shown the great power in the field of fault detection. However, for incipient faults with tiny amplitude, the detection performance of the current deep learning networks (DLNs) is not satisfactory. Even if prior information about the faults is utilized, DLNs can't successfully detect faults 3, 9 and 15 in Tennessee Eastman process (TEP). These faults are notoriously difficult to detect, lacking effective detection technologies in the field of fault detection. In this work, we propose Autoencoder-assisted Feature Ensemble Net (AE-FENet): a deep feature ensemble framework that uses the unsupervised autoencoder to conduct the feature transformation. Compared with the principle component analysis (PCA) technique adopted in the original Feature Ensemble Net (FENet), autoencoder can mine more exact features on incipient faults, which results in the better detection performance of AE-FENet. With same kinds of basic detectors, AE-FENet achieves a state-of-the-art average accuracy over 96% on faults 3, 9 and 15 in TEP, which represents a significant enhancement in performance compared to other methods. Plenty of experiments have been done to extend our framework, proving that DLNs can be utilized efficiently within this architecture.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
A Platform for All-optical Thomson/ Compton Scattering with Versatile Parameters
Authors:
Siyu Chen,
Wenchao Yan,
Mingyang Zhu,
Yaojun Li,
Xichen Hu,
Hao Xu,
Jie Feng,
Xulei Ge,
Wenzhao Wang,
Guangwei Lu,
Mingxuan Wei,
Lin Lu,
Xiaojun Huang,
Boyuan Li,
Xiaohui Yuan,
Feng Liu,
Min Chen,
Liming Chen,
Jie Zhang
Abstract:
A dual-beam platform for all-optical electron-photon scattering, or Thomson/Compton scattering, with adjustable collision-angle and parameter tuning ability has been developed, which, in principle, can be used for the verification of strong-field quantum electrodynamics effects. Combining this platform with a 200 TW Ti:Sapphire laser system, we demonstrated the generation of inverse Compton scatte…
▽ More
A dual-beam platform for all-optical electron-photon scattering, or Thomson/Compton scattering, with adjustable collision-angle and parameter tuning ability has been developed, which, in principle, can be used for the verification of strong-field quantum electrodynamics effects. Combining this platform with a 200 TW Ti:Sapphire laser system, we demonstrated the generation of inverse Compton scattering X/gamma-rays with tunable energies from tens of keV to MeV. The polarization of X/gamma radiation was manipulated by controlling the polarization of scattering laser. In the near future, by combining this experimental platform with multi-PW laser facilities, it is proposed to experimentally generate X/gamma radiation with orbital angular momentum for the nuclear isomer excitation, and more importantly, to explore the regime transition from nonlinear Thomson scattering to nonlinear Compton scattering, eventually to demonstrate the verification of theories on extremely strong field quantum electrodynamics effects.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Study of $e^+e^-\toωX(3872)$ and $γX(3872)$ from 4.66 to 4.95 GeV
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (634 additional authors not shown)
Abstract:
Using data samples with an integrated luminosity of $4.5~\text{fb}^{-1}$ collected by the BESIII detector at center-of-mass energies ranging from 4.66 to 4.95 GeV, we study the processes of $e^+e^-\toωX(3872)$ and $e^+e^-\toγX(3872)$. With the $e^+e^-\toωX(3872)$ process, the branching fraction ratio $R\equiv\frac{\mathcal{B}(X(3872)\toγJ/ψ)}{\mathcal{B}(X(3872)\toπ^+π^- J/ψ)}$ is measured to be…
▽ More
Using data samples with an integrated luminosity of $4.5~\text{fb}^{-1}$ collected by the BESIII detector at center-of-mass energies ranging from 4.66 to 4.95 GeV, we study the processes of $e^+e^-\toωX(3872)$ and $e^+e^-\toγX(3872)$. With the $e^+e^-\toωX(3872)$ process, the branching fraction ratio $R\equiv\frac{\mathcal{B}(X(3872)\toγJ/ψ)}{\mathcal{B}(X(3872)\toπ^+π^- J/ψ)}$ is measured to be $0.38\pm0.20_\text{stat.}\pm0.01_\text{syst.}$ ($R< 0.83$ at 90\% confidence level). In addition, we measure the ratio of the average cross section of $e^+e^-\toωX(3872)$ to $e^+e^-\toωχ_{c1}(ωχ_{c2})$ to be $σ_{ωX(3872)}/σ_{ωχ_{c1}}~(σ_{ωX(3872)}/σ_{ωχ_{c2}})=5.2\pm1.0_\text{stat.}\pm1.9_\text{syst.}~ (5.5\pm1.1_\text{stat.}\pm2.4_\text{syst.})$. Finally, we search for the process of $e^+e^-\toγX(3872)$, and no obvious signal is observed. The upper limit on the ratio of the average cross section of $e^+e^-\toγX(3872)$ to $e^+e^-\toωX(3872)$ is set as $σ_{γX(3872)}/σ_{ωX(3872)}<0.23$ at 90\% confidence level.
△ Less
Submitted 21 April, 2024;
originally announced April 2024.
-
E-QGen: Educational Lecture Abstract-based Question Generation System
Authors:
Mao-Siang Chen,
An-Zi Yen
Abstract:
To optimize the preparation process for educators in academic lectures and associated question-and-answer sessions, this paper presents E-QGen, a lecture abstract-based question generation system. Given a lecture abstract, E-QGen generates potential student inquiries. The questions suggested by our system are expected to not only facilitate teachers in preparing answers in advance but also enable…
▽ More
To optimize the preparation process for educators in academic lectures and associated question-and-answer sessions, this paper presents E-QGen, a lecture abstract-based question generation system. Given a lecture abstract, E-QGen generates potential student inquiries. The questions suggested by our system are expected to not only facilitate teachers in preparing answers in advance but also enable them to supply additional resources when necessary.
△ Less
Submitted 21 April, 2024;
originally announced April 2024.
-
Nonreciprocal PT-symmetric phase transition in a non-Hermitian chiral quantum optical system
Authors:
Miao Cai,
Jiang-Shan Tang,
Ming-Yuan Chen,
Keyu Xia
Abstract:
Phase transitions, non-Hermiticity and nonreciprocity play central roles in fundamental physics. However, the triple interplay of these three fields is of lack in the quantum domain. Here, we show nonreciprocal parity-time-symmetric phase transition in a non-Hermitian chiral quantum electrodynamical system, caused by the directional system dissipation. In remarkable contrast to previously reported…
▽ More
Phase transitions, non-Hermiticity and nonreciprocity play central roles in fundamental physics. However, the triple interplay of these three fields is of lack in the quantum domain. Here, we show nonreciprocal parity-time-symmetric phase transition in a non-Hermitian chiral quantum electrodynamical system, caused by the directional system dissipation. In remarkable contrast to previously reported nonreciprocal phase transitions, the nonreciprocal parity-time-symmetric phases appear even when the atom-resonator coupling is reciprocal. Nonreciprocal photon blockade is obtained in the nonreciprocal phase region. These results may deepen the fundamental insight of nonreciprocal and non-Hermitian quantum physics, and also open a new door for unconventional quantum manipulation.
△ Less
Submitted 21 April, 2024; v1 submitted 19 April, 2024;
originally announced April 2024.
-
CaBaFL: Asynchronous Federated Learning via Hierarchical Cache and Feature Balance
Authors:
Zeke Xia,
Ming Hu,
Dengke Yan,
Xiaofei Xie,
Tianlin Li,
Anran Li,
Junlong Zhou,
Mingsong Chen
Abstract:
Federated Learning (FL) as a promising distributed machine learning paradigm has been widely adopted in Artificial Intelligence of Things (AIoT) applications. However, the efficiency and inference capability of FL is seriously limited due to the presence of stragglers and data imbalance across massive AIoT devices, respectively. To address the above challenges, we present a novel asynchronous FL a…
▽ More
Federated Learning (FL) as a promising distributed machine learning paradigm has been widely adopted in Artificial Intelligence of Things (AIoT) applications. However, the efficiency and inference capability of FL is seriously limited due to the presence of stragglers and data imbalance across massive AIoT devices, respectively. To address the above challenges, we present a novel asynchronous FL approach named CaBaFL, which includes a hierarchical Cache-based aggregation mechanism and a feature Balance-guided device selection strategy. CaBaFL maintains multiple intermediate models simultaneously for local training. The hierarchical cache-based aggregation mechanism enables each intermediate model to be trained on multiple devices to align the training time and mitigate the straggler issue. In specific, each intermediate model is stored in a low-level cache for local training and when it is trained by sufficient local devices, it will be stored in a high-level cache for aggregation. To address the problem of imbalanced data, the feature balance-guided device selection strategy in CaBaFL adopts the activation distribution as a metric, which enables each intermediate model to be trained across devices with totally balanced data distributions before aggregation. Experimental results show that compared with the state-of-the-art FL methods, CaBaFL achieves up to 9.26X training acceleration and 19.71\% accuracy improvements.
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
KoReA-SFL: Knowledge Replay-based Split Federated Learning Against Catastrophic Forgetting
Authors:
Zeke Xia,
Ming Hu,
Dengke Yan,
Ruixuan Liu,
Anran Li,
Xiaofei Xie,
Mingsong Chen
Abstract:
Although Split Federated Learning (SFL) is good at enabling knowledge sharing among resource-constrained clients, it suffers from the problem of low training accuracy due to the neglect of data heterogeneity and catastrophic forgetting. To address this issue, we propose a novel SFL approach named KoReA-SFL, which adopts a multi-model aggregation mechanism to alleviate gradient divergence caused by…
▽ More
Although Split Federated Learning (SFL) is good at enabling knowledge sharing among resource-constrained clients, it suffers from the problem of low training accuracy due to the neglect of data heterogeneity and catastrophic forgetting. To address this issue, we propose a novel SFL approach named KoReA-SFL, which adopts a multi-model aggregation mechanism to alleviate gradient divergence caused by heterogeneous data and a knowledge replay strategy to deal with catastrophic forgetting. Specifically, in KoReA-SFL cloud servers (i.e., fed server and main server) maintain multiple branch model portions rather than a global portion for local training and an aggregated master-model portion for knowledge sharing among branch portions. To avoid catastrophic forgetting, the main server of KoReA-SFL selects multiple assistant devices for knowledge replay according to the training data distribution of each server-side branch-model portion. Experimental results obtained from non-IID and IID scenarios demonstrate that KoReA-SFL significantly outperforms conventional SFL methods (by up to 23.25\% test accuracy improvement).
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
JointViT: Modeling Oxygen Saturation Levels with Joint Supervision on Long-Tailed OCTA
Authors:
Zeyu Zhang,
Xuyin Qi,
Mingxi Chen,
Guangxi Li,
Ryan Pham,
Ayub Qassim,
Ella Berry,
Zhibin Liao,
Owen Siggs,
Robert Mclaughlin,
Jamie Craig,
Minh-Son To
Abstract:
The oxygen saturation level in the blood (SaO2) is crucial for health, particularly in relation to sleep-related breathing disorders. However, continuous monitoring of SaO2 is time-consuming and highly variable depending on patients' conditions. Recently, optical coherence tomography angiography (OCTA) has shown promising development in rapidly and effectively screening eye-related lesions, offeri…
▽ More
The oxygen saturation level in the blood (SaO2) is crucial for health, particularly in relation to sleep-related breathing disorders. However, continuous monitoring of SaO2 is time-consuming and highly variable depending on patients' conditions. Recently, optical coherence tomography angiography (OCTA) has shown promising development in rapidly and effectively screening eye-related lesions, offering the potential for diagnosing sleep-related disorders. To bridge this gap, our paper presents three key contributions. Firstly, we propose JointViT, a novel model based on the Vision Transformer architecture, incorporating a joint loss function for supervision. Secondly, we introduce a balancing augmentation technique during data preprocessing to improve the model's performance, particularly on the long-tail distribution within the OCTA dataset. Lastly, through comprehensive experiments on the OCTA dataset, our proposed method significantly outperforms other state-of-the-art methods, achieving improvements of up to 12.28% in overall accuracy. This advancement lays the groundwork for the future utilization of OCTA in diagnosing sleep-related disorders. See project website https://steve-zeyu-zhang.github.io/JointViT
△ Less
Submitted 18 April, 2024; v1 submitted 17 April, 2024;
originally announced April 2024.
-
Photonic indistinguishability characterization and optimization for cavity-based single-photon source
Authors:
Miao Cai,
Mingyuan Chen,
Jiangshan Tang,
Keyu Xia
Abstract:
Indistinguishability of single photons from independent sources is critically important for scalable quantum technologies. We provide a comprehensive comparison of single-photon indistinguishability of different kinds of cavity quantum electrodynamics (CQED) systems by numerically simulating Hong-Ou-Mandel (HOM) two-photon interference. We find that the CQED system using nature atoms exhibit super…
▽ More
Indistinguishability of single photons from independent sources is critically important for scalable quantum technologies. We provide a comprehensive comparison of single-photon indistinguishability of different kinds of cavity quantum electrodynamics (CQED) systems by numerically simulating Hong-Ou-Mandel (HOM) two-photon interference. We find that the CQED system using nature atoms exhibit superiority in indistinguishability, benefiting from the inherently identical features. Moreover, a $Λ-$type three-level atoms show essential robust against variation of various system parameters because it exploits the two ground states with considerable smaller decay rates for single-photon generation. Furthermore, a machine learning-based framework is proposed to significantly and robustly improve single-photon indistinguishability for non-identical two CQED systems. This work may pave the way for designing and engineering reliable and scalable photon-based quantum technologies.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
Offset Unlearning for Large Language Models
Authors:
James Y. Huang,
Wenxuan Zhou,
Fei Wang,
Fred Morstatter,
Sheng Zhang,
Hoifung Poon,
Muhao Chen
Abstract:
Despite the strong capabilities of Large Language Models (LLMs) to acquire knowledge from their training corpora, the memorization of sensitive information in the corpora such as copyrighted, harmful, and private content has led to ethical and legal concerns. In response to these challenges, unlearning has emerged as a potential remedy for LLMs affected by problematic training data. However, previ…
▽ More
Despite the strong capabilities of Large Language Models (LLMs) to acquire knowledge from their training corpora, the memorization of sensitive information in the corpora such as copyrighted, harmful, and private content has led to ethical and legal concerns. In response to these challenges, unlearning has emerged as a potential remedy for LLMs affected by problematic training data. However, previous unlearning techniques are either not applicable to black-box LLMs due to required access to model internal weights, or violate data protection principles by retaining sensitive data for inference-time correction. We propose $δ$-unlearning, an offset unlearning framework for black-box LLMs. Instead of tuning the black-box LLM itself, $δ$-unlearning learns the logit offset needed for unlearning by contrasting the logits from a pair of smaller models. Experiments demonstrate that $δ$-unlearning can effectively unlearn target data while maintaining similar or even stronger performance on general out-of-forget-scope tasks. $δ$-unlearning also effectively incorporates different unlearning algorithms, making our approach a versatile solution to adapting various existing unlearning algorithms to black-box LLMs.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.