-
Class-Aware Cartilage Segmentation for Autonomous US-CT Registration in Robotic Intercostal Ultrasound Imaging
Authors:
Zhongliang Jiang,
Yunfeng Kang,
Yuan Bi,
Xuesong Li,
Chenyang Li,
Nassir Navab
Abstract:
Ultrasound imaging has been widely used in clinical examinations owing to the advantages of being portable, real-time, and radiation-free. Considering the potential of extensive deployment of autonomous examination systems in hospitals, robotic US imaging has attracted increased attention. However, due to the inter-patient variations, it is still challenging to have an optimal path for each patien…
▽ More
Ultrasound imaging has been widely used in clinical examinations owing to the advantages of being portable, real-time, and radiation-free. Considering the potential of extensive deployment of autonomous examination systems in hospitals, robotic US imaging has attracted increased attention. However, due to the inter-patient variations, it is still challenging to have an optimal path for each patient, particularly for thoracic applications with limited acoustic windows, e.g., intercostal liver imaging. To address this problem, a class-aware cartilage bone segmentation network with geometry-constraint post-processing is presented to capture patient-specific rib skeletons. Then, a dense skeleton graph-based non-rigid registration is presented to map the intercostal scanning path from a generic template to individual patients. By explicitly considering the high-acoustic impedance bone structures, the transferred scanning path can be precisely located in the intercostal space, enhancing the visibility of internal organs by reducing the acoustic shadow. To evaluate the proposed approach, the final path map** performance is validated on five distinct CTs and two volunteer US data, resulting in ten pairs of CT-US combinations. Results demonstrate that the proposed graph-based registration method can robustly and precisely map the path from CT template to individual patients (Euclidean error: $2.21\pm1.11~mm$).
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
C^2RV: Cross-Regional and Cross-View Learning for Sparse-View CBCT Reconstruction
Authors:
Yiqun Lin,
Jiewen Yang,
Hualiang Wang,
Xinpeng Ding,
Wei Zhao,
Xiaomeng Li
Abstract:
Cone beam computed tomography (CBCT) is an important imaging technology widely used in medical scenarios, such as diagnosis and preoperative planning. Using fewer projection views to reconstruct CT, also known as sparse-view reconstruction, can reduce ionizing radiation and further benefit interventional radiology. Compared with sparse-view reconstruction for traditional parallel/fan-beam CT, CBCT…
▽ More
Cone beam computed tomography (CBCT) is an important imaging technology widely used in medical scenarios, such as diagnosis and preoperative planning. Using fewer projection views to reconstruct CT, also known as sparse-view reconstruction, can reduce ionizing radiation and further benefit interventional radiology. Compared with sparse-view reconstruction for traditional parallel/fan-beam CT, CBCT reconstruction is more challenging due to the increased dimensionality caused by the measurement process based on cone-shaped X-ray beams. As a 2D-to-3D reconstruction problem, although implicit neural representations have been introduced to enable efficient training, only local features are considered and different views are processed equally in previous works, resulting in spatial inconsistency and poor performance on complicated anatomies. To this end, we propose C^2RV by leveraging explicit multi-scale volumetric representations to enable cross-regional learning in the 3D space. Additionally, the scale-view cross-attention module is introduced to adaptively aggregate multi-scale and multi-view features. Extensive experiments demonstrate that our C^2RV achieves consistent and significant improvement over previous state-of-the-art methods on datasets with diverse anatomy.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
A Comprehensive Study of Quantum Arithmetic Circuits
Authors:
Siyi Wang,
Xiufan Li,
Wei Jie Bryan Lee,
Suman Deb,
Eugene Lim,
Anupam Chattopadhyay
Abstract:
In recent decades, the field of quantum computing has experienced remarkable progress. This progress is marked by the superior performance of many quantum algorithms compared to their classical counterparts, with Shor's algorithm serving as a prominent illustration. Quantum arithmetic circuits, which are the fundamental building blocks in numerous quantum algorithms, have attracted much attention.…
▽ More
In recent decades, the field of quantum computing has experienced remarkable progress. This progress is marked by the superior performance of many quantum algorithms compared to their classical counterparts, with Shor's algorithm serving as a prominent illustration. Quantum arithmetic circuits, which are the fundamental building blocks in numerous quantum algorithms, have attracted much attention. Despite extensive exploration of various designs in the existing literature, researchers remain keen on develo** novel designs and improving existing ones.
In this review article, we aim to provide a systematically organized and easily comprehensible overview of the current state-of-the-art in quantum arithmetic circuits. Specifically, this study covers fundamental operations such as addition, subtraction, multiplication, division and modular exponentiation. We delve into the detailed quantum implementations of these prominent designs and evaluate their efficiency considering various objectives. We also discuss potential applications of presented arithmetic circuits and suggest future research directions.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Time delay of fast radio burst population with respect to the star formation history
Authors:
Hai-Nan Lin,
Xin-Yi Li,
Rui Zou
Abstract:
In spite of significant progress in the research of fast radio bursts (FRBs) in recent decade, their origin is still under extensive debate. Investigation on the population of FRBs can provide new insight into this interesting problem. In this paper, based on the first CHIME/FRB catalog, we construct a Bayesian framework to analyze the FRB population, with the selection effect of the CHIME telesco…
▽ More
In spite of significant progress in the research of fast radio bursts (FRBs) in recent decade, their origin is still under extensive debate. Investigation on the population of FRBs can provide new insight into this interesting problem. In this paper, based on the first CHIME/FRB catalog, we construct a Bayesian framework to analyze the FRB population, with the selection effect of the CHIME telescope being properly taken into account. The energy function is modeled as the power-law with an exponential cutoff. Four redshift distribution models are considered, i.e., the star formation history (SFH) model, and three time-delayed models (Gaussian delay, log-normal delay, and power-law delay). The free parameters are simultaneously constrained using Bayesian inference method, and the Bayesian information criterion (BIC) is used in model comparison. According to BIC, the log-normal delay model fits the data best. The power-law delay model and Gaussian delay model can also give reasonable fits, although they are not as good as the log-normal delay model. However, the SFH model is strongly disfavored compared with the three time-delayed models. The energy function is tightly constrained and is almost independent of the redshift models, with the best-fitting power-law index $α\approx 1.8$, and cut-off energy $\log(E_c/{\rm erg})\approx 42$. The FRB population shows on average $3\sim 5$ billion years time delay with respect to the SFH. Therefore, the hypothesis that the FRB population traces the SFH is conclusively ruled out.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
The impact of nodes of information dissemination on epidemic spreading in dynamic multiplex networks
Authors:
Minyu Feng,
Xiangxi Li,
Yuhan Li,
Qin Li
Abstract:
Epidemic spreading processes on dynamic multiplex networks provide a more accurate description of natural spreading processes than those on single layered networks. To describe the influence of different individuals in the awareness layer on epidemic spreading, we propose a two-layer network-based epidemic spreading model, including some individuals who neglect the epidemic, and we explore how ind…
▽ More
Epidemic spreading processes on dynamic multiplex networks provide a more accurate description of natural spreading processes than those on single layered networks. To describe the influence of different individuals in the awareness layer on epidemic spreading, we propose a two-layer network-based epidemic spreading model, including some individuals who neglect the epidemic, and we explore how individuals with different properties in the awareness layer will affect the spread of epidemics. The two-layer network model is divided into an information transmission layer and a disease spreading layer. Each node in the layer represents an individual with different connections in different layers. Individuals with awareness will be infected with a lower probability compared to unaware individuals, which corresponds to the various epidemic prevention measures in real life. We adopt the micro-Markov chain approach to analytically derive the threshold for the proposed epidemic model, which demonstrates that the awareness layer affects the threshold of disease spreading. We then explore how individuals with different properties would affect the disease spreading process through extensive Monte Carlo numerical simulations. We find that individuals with high centrality in the awareness layer would significantly inhibit the transmission of infectious diseases. Additionally, we propose conjectures and explanations for the approximately linear effect of individuals with low centrality in the awareness layer on the number of infected individuals.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Dynamic properties of a class of van der Pol-Duffing oscillators
Authors:
Yelei Kuang,
Xuemei Li
Abstract:
In this paper, we study the existence of bifurcation of a van der Pol-Duffing oscillator with quintic terms and its quasi-periodic solutions by means of qualitative and bifurcation theories. Firstly, we analyze the autonomous system and find that it has two kinds of local bifurcations and a global bifurcation: pitchfork bifurcation, Hopf bifurcation, homoclinic bifurcation. It is worth noting that…
▽ More
In this paper, we study the existence of bifurcation of a van der Pol-Duffing oscillator with quintic terms and its quasi-periodic solutions by means of qualitative and bifurcation theories. Firstly, we analyze the autonomous system and find that it has two kinds of local bifurcations and a global bifurcation: pitchfork bifurcation, Hopf bifurcation, homoclinic bifurcation. It is worth noting that the disappearance of the homoclinic orbit is synchronized with the emergence of a large limit cycle. Then, by discussing the stability of equilibria at infinity and the orientation of the trajectory, the existence and stability of limit circles of the autonomous system are analyzed by combining the Poincaré-Bendixson theorem and the index theory. The global phase portrait and the numerical simulation of the autonomous system in different parameter values are given. Finally, the existence of periodic and quasi-periodic solutions to periodic forced system is proved by a KAM theorem.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Robust Knowledge Distillation Based on Feature Variance Against Backdoored Teacher Model
Authors:
**yin Chen,
Xiaoming Zhao,
Haibin Zheng,
Xiao Li,
Sheng Xiang,
Haifeng Guo
Abstract:
Benefiting from well-trained deep neural networks (DNNs), model compression have captured special attention for computing resource limited equipment, especially edge devices. Knowledge distillation (KD) is one of the widely used compression techniques for edge deployment, by obtaining a lightweight student model from a well-trained teacher model released on public platforms. However, it has been e…
▽ More
Benefiting from well-trained deep neural networks (DNNs), model compression have captured special attention for computing resource limited equipment, especially edge devices. Knowledge distillation (KD) is one of the widely used compression techniques for edge deployment, by obtaining a lightweight student model from a well-trained teacher model released on public platforms. However, it has been empirically noticed that the backdoor in the teacher model will be transferred to the student model during the process of KD. Although numerous KD methods have been proposed, most of them focus on the distillation of a high-performing student model without robustness consideration. Besides, some research adopts KD techniques as effective backdoor mitigation tools, but they fail to perform model compression at the same time. Consequently, it is still an open problem to well achieve two objectives of robust KD, i.e., student model's performance and backdoor mitigation. To address these issues, we propose RobustKD, a robust knowledge distillation that compresses the model while mitigating backdoor based on feature variance. Specifically, RobustKD distinguishes the previous works in three key aspects: (1) effectiveness: by distilling the feature map of the teacher model after detoxification, the main task performance of the student model is comparable to that of the teacher model; (2) robustness: by reducing the characteristic variance between the teacher model and the student model, it mitigates the backdoor of the student model under backdoored teacher model scenario; (3) generic: RobustKD still has good performance in the face of multiple data models (e.g., WRN 28-4, Pyramid-200) and diverse DNNs (e.g., ResNet50, MobileNet).
△ Less
Submitted 1 June, 2024;
originally announced June 2024.
-
Gaussian Representation for Deformable Image Registration
Authors:
Jihe Li,
Fabian Zhang,
Xia Li,
Tianhao Zhang,
Ye Zhang,
Joachim Buhmann
Abstract:
Deformable image registration (DIR) is a fundamental task in radiotherapy, with existing methods often struggling to balance computational efficiency, registration accuracy, and speed effectively. We introduce a novel DIR approach employing parametric 3D Gaussian control points achieving a better tradeoff. It provides an explicit and flexible representation for spatial deformation fields between 3…
▽ More
Deformable image registration (DIR) is a fundamental task in radiotherapy, with existing methods often struggling to balance computational efficiency, registration accuracy, and speed effectively. We introduce a novel DIR approach employing parametric 3D Gaussian control points achieving a better tradeoff. It provides an explicit and flexible representation for spatial deformation fields between 3D volumetric medical images, producing a displacement vector field (DVF) across all volumetric positions. The movement of individual voxels is derived using linear blend skinning (LBS) through localized interpolation of transformations associated with neighboring Gaussians. This interpolation strategy not only simplifies the determination of voxel motions but also acts as an effective regularization technique. Our approach incorporates a unified optimization process through backpropagation, enabling iterative learning of both the parameters of the 3D Gaussians and their transformations. Additionally, the density of Gaussians is adjusted adaptively during the learning phase to accommodate varying degrees of motion complexity. We validated our approach on the 4D-CT lung DIR-Lab and cardiac ACDC datasets, achieving an average target registration error (TRE) of 1.06 mm within a much-improved processing time of 2.43 seconds for the DIR-Lab dataset over existing methods, demonstrating significant advancements in both accuracy and efficiency.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Reference Channel Selection by Multi-Channel Masking for End-to-End Multi-Channel Speech Enhancement
Authors:
Wang Dai,
Xiaofei Li,
Archontis Politis,
Tuomas Virtanen
Abstract:
In end-to-end multi-channel speech enhancement, the traditional approach of designating one microphone signal as the reference for processing may not always yield optimal results. The limitation is particularly in scenarios with large distributed microphone arrays with varying speaker-to-microphone distances or compact, highly directional microphone arrays where speaker or microphone positions cha…
▽ More
In end-to-end multi-channel speech enhancement, the traditional approach of designating one microphone signal as the reference for processing may not always yield optimal results. The limitation is particularly in scenarios with large distributed microphone arrays with varying speaker-to-microphone distances or compact, highly directional microphone arrays where speaker or microphone positions change over time. Current mask-based methods often fix the reference channel during training, which makes it not possible to adaptively select the reference channel for optimal performance. To address this problem, we introduce an adaptive approach for selecting the optimal reference channel. Our method leverages a multi-channel masking-based scheme, where multiple masked signals are combined to generate a single-channel output signal. This enhanced signal is then used for loss calculation, while the reference clean speech is adjusted based on the highest scale-invariant signal-to-distortion ratio (SI-SDR). The experimental results on the Spear challenge simulated dataset D4 demonstrate the superiority of our proposed method over the conventional approach of using a fixed reference channel with single-channel masking.
△ Less
Submitted 11 June, 2024; v1 submitted 5 June, 2024;
originally announced June 2024.
-
A Quantum Neural Network-Based Approach to Power Quality Disturbances Detection and Recognition
Authors:
Guo-Dong Li,
Hai-Yan He,
Yue Li,
Xin-Hao Li,
Hao Liu,
Qing-Le Wang,
Long Cheng
Abstract:
Power quality disturbances (PQDs) significantly impact the stability and reliability of power systems, necessitating accurate and efficient detection and recognition methods. While numerous classical algorithms for PQDs detection and recognition have been extensively studied and applied, related work in the quantum domain is still in its infancy. In this paper, an improved quantum neural networks…
▽ More
Power quality disturbances (PQDs) significantly impact the stability and reliability of power systems, necessitating accurate and efficient detection and recognition methods. While numerous classical algorithms for PQDs detection and recognition have been extensively studied and applied, related work in the quantum domain is still in its infancy. In this paper, an improved quantum neural networks (QNN) model for PQDs detection and recognition is proposed. Specifically, the model constructs a quantum circuit comprising data qubits and ancilla qubits. Classical data is transformed into quantum data by embedding it into data qubits via the encoding layer. Subsequently, parametric quantum gates are utilized to form the variational layer, which facilitates qubit information transformation, thereby extracting essential feature information for detection and recognition. The expected value is obtained by measuring ancilla qubits, enabling the completion of disturbance classification based on this expected value. An analysis reveals that the runtime and space complexities of the QNN are $O\left ( poly\left ( N \right ) \right )$ and $O\left ( N \right )$, respectively. Extensive experiments validate the feasibility and superiority of the proposed model in PQD detection and recognition. The model achieves accuracies of 99.75\%, 97.85\% and 95.5\% in experiments involving the detection of disturbances, recognition of seven single disturbances, and recognition of ten mixed disturbances, respectively. Additionally, noise simulation and comparative experiments demonstrate that the proposed model exhibits robust anti-noise capabilities, requires few training parameters, and maintains high accuracy.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Achieving Near-Optimal Convergence for Distributed Minimax Optimization with Adaptive Stepsizes
Authors:
Yan Huang,
Xiang Li,
Yipeng Shen,
Niao He,
**ming Xu
Abstract:
In this paper, we show that applying adaptive methods directly to distributed minimax problems can result in non-convergence due to inconsistency in locally computed adaptive stepsizes. To address this challenge, we propose D-AdaST, a Distributed Adaptive minimax method with Stepsize Tracking. The key strategy is to employ an adaptive stepsize tracking protocol involving the transmission of two ex…
▽ More
In this paper, we show that applying adaptive methods directly to distributed minimax problems can result in non-convergence due to inconsistency in locally computed adaptive stepsizes. To address this challenge, we propose D-AdaST, a Distributed Adaptive minimax method with Stepsize Tracking. The key strategy is to employ an adaptive stepsize tracking protocol involving the transmission of two extra (scalar) variables. This protocol ensures the consistency among stepsizes of nodes, eliminating the steady-state error due to the lack of coordination of stepsizes among nodes that commonly exists in vanilla distributed adaptive methods, and thus guarantees exact convergence. For nonconvex-strongly-concave distributed minimax problems, we characterize the specific transient times that ensure time-scale separation of stepsizes and quasi-independence of networks, leading to a near-optimal convergence rate of $\tilde{\mathcal{O}} \left( ε^{-\left( 4+δ\right)} \right)$ for any small $δ> 0$, matching that of the centralized counterpart. To our best knowledge, D-AdaST is the first distributed adaptive method achieving near-optimal convergence without knowing any problem-dependent parameters for nonconvex minimax problems. Extensive experiments are conducted to validate our theoretical results.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Measurements of the branching fractions of the $P$-wave charmonium spin-singlet state $h_c(^1P_1) \to h^+ h^-π^0/η$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (643 additional authors not shown)
Abstract:
Based on $(2712.4\pm 14.3)\times10^{6}$ $ψ(3686)$ events, we investigate four hadronic decay modes of the $P$-wave charmonium spin-singlet state $h_c(^1P_1) \to h^+ h^- π^0/η$ ($h=π$ or $K$) via the process $ψ(3686) \to π^{0}h_c$ at BESIII. The $h_c \to π^+ π^- π^0$ decay is observed with a significance of 9.6$σ$ after taking into account systematic uncertainties. Evidences for…
▽ More
Based on $(2712.4\pm 14.3)\times10^{6}$ $ψ(3686)$ events, we investigate four hadronic decay modes of the $P$-wave charmonium spin-singlet state $h_c(^1P_1) \to h^+ h^- π^0/η$ ($h=π$ or $K$) via the process $ψ(3686) \to π^{0}h_c$ at BESIII. The $h_c \to π^+ π^- π^0$ decay is observed with a significance of 9.6$σ$ after taking into account systematic uncertainties. Evidences for $h_c \to K^+ K^- π^0$ and $h_c \to K^+ K^- η$ are found with significances of $3.5σ$ and $3.3σ$, respectively, after considering the systematic uncertainties. The branching fractions of these decays are measured to be $\mathcal{B}(h_c \to π^+ π^- π^0)=(1.36\pm0.16\pm0.14)\times10^{-3}$, $\mathcal{B}(h_c \to K^+ K^- π^0)=(3.26\pm0.84\pm0.36)\times10^{-4}$, and $\mathcal{B}(h_c \to K^+ K^- η)=(3.13\pm1.08\pm0.38)\times10^{-4}$, where the first uncertainties are statistical and the second are systematic. No significant signal of $h_c\toπ^+π^-η$ is found, and the upper limit of its decay branching fraction is determined to be $\mathcal{B}(h_c\toπ^+π^-η) < 4.0 \times 10^{-4}$ at 90% confidence level.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
DenoDet: Attention as Deformable Multi-Subspace Feature Denoising for Target Detection in SAR Images
Authors:
Yimian Dai,
Minrui Zou,
Yuxuan Li,
Xiang Li,
Kang Ni,
Jian Yang
Abstract:
Synthetic Aperture Radar (SAR) target detection has long been impeded by inherent speckle noise and the prevalence of diminutive, ambiguous targets. While deep neural networks have advanced SAR target detection, their intrinsic low-frequency bias and static post-training weights falter with coherent noise and preserving subtle details across heterogeneous terrains. Motivated by traditional SAR ima…
▽ More
Synthetic Aperture Radar (SAR) target detection has long been impeded by inherent speckle noise and the prevalence of diminutive, ambiguous targets. While deep neural networks have advanced SAR target detection, their intrinsic low-frequency bias and static post-training weights falter with coherent noise and preserving subtle details across heterogeneous terrains. Motivated by traditional SAR image denoising, we propose DenoDet, a network aided by explicit frequency domain transform to calibrate convolutional biases and pay more attention to high-frequencies, forming a natural multi-scale subspace representation to detect targets from the perspective of multi-subspace denoising. We design TransDeno, a dynamic frequency domain attention module that performs as a transform domain soft thresholding operation, dynamically denoising across subspaces by preserving salient target signals and attenuating noise. To adaptively adjust the granularity of subspace processing, we also propose a deformable group fully-connected layer (DeGroFC) that dynamically varies the group conditioned on the input features. Without bells and whistles, our plug-and-play TransDeno sets state-of-the-art scores on multiple SAR target detection datasets. The code is available at https://github.com/GrokCV/GrokSAR.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Evidentially Calibrated Source-Free Time-Series Domain Adaptation with Temporal Imputation
Authors:
Mohamed Ragab,
Peiliang Gong,
Emadeldeen Eldele,
Wenyu Zhang,
Min Wu,
Chuan-Sheng Foo,
Daoqiang Zhang,
Xiaoli Li,
Zhenghua Chen
Abstract:
Source-free domain adaptation (SFDA) aims to adapt a model pre-trained on a labeled source domain to an unlabeled target domain without access to source data, preserving the source domain's privacy. While SFDA is prevalent in computer vision, it remains largely unexplored in time series analysis. Existing SFDA methods, designed for visual data, struggle to capture the inherent temporal dynamics of…
▽ More
Source-free domain adaptation (SFDA) aims to adapt a model pre-trained on a labeled source domain to an unlabeled target domain without access to source data, preserving the source domain's privacy. While SFDA is prevalent in computer vision, it remains largely unexplored in time series analysis. Existing SFDA methods, designed for visual data, struggle to capture the inherent temporal dynamics of time series, hindering adaptation performance. This paper proposes MAsk And imPUte (MAPU), a novel and effective approach for time series SFDA. MAPU addresses the critical challenge of temporal consistency by introducing a novel temporal imputation task. This task involves randomly masking time series signals and leveraging a dedicated temporal imputer to recover the original signal within the learned embedding space, bypassing the complexities of noisy raw data. Notably, MAPU is the first method to explicitly address temporal consistency in the context of time series SFDA. Additionally, it offers seamless integration with existing SFDA methods, providing greater flexibility. We further introduce E-MAPU, which incorporates evidential uncertainty estimation to address the overconfidence issue inherent in softmax predictions. To achieve that, we leverage evidential deep learning to obtain a better-calibrated pre-trained model and adapt the target encoder to map out-of-support target samples to a new feature representation closer to the source domain's support. This fosters better alignment, ultimately enhancing adaptation performance. Extensive experiments on five real-world time series datasets demonstrate that both MAPU and E-MAPU achieve significant performance gains compared to existing methods. These results highlight the effectiveness of our proposed approaches for tackling various time series domain adaptation problems.
△ Less
Submitted 12 June, 2024; v1 submitted 4 June, 2024;
originally announced June 2024.
-
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Authors:
Philip Anastassiou,
Jiawei Chen,
Jitong Chen,
Yuanzhe Chen,
Zhuo Chen,
Ziyi Chen,
Jian Cong,
Lelai Deng,
Chuang Ding,
Lu Gao,
Mingqing Gong,
Peisong Huang,
Qingqing Huang,
Zhiying Huang,
Yuanyuan Huo,
Dongya Jia,
Chumin Li,
Feiya Li,
Hui Li,
Jiaxin Li,
Xiaoyang Li,
Xingxing Li,
Lin Liu,
Shouda Liu,
Sichao Liu
, et al. (21 additional authors not shown)
Abstract:
We introduce Seed-TTS, a family of large-scale autoregressive text-to-speech (TTS) models capable of generating speech that is virtually indistinguishable from human speech. Seed-TTS serves as a foundation model for speech generation and excels in speech in-context learning, achieving performance in speaker similarity and naturalness that matches ground truth human speech in both objective and sub…
▽ More
We introduce Seed-TTS, a family of large-scale autoregressive text-to-speech (TTS) models capable of generating speech that is virtually indistinguishable from human speech. Seed-TTS serves as a foundation model for speech generation and excels in speech in-context learning, achieving performance in speaker similarity and naturalness that matches ground truth human speech in both objective and subjective evaluations. With fine-tuning, we achieve even higher subjective scores across these metrics. Seed-TTS offers superior controllability over various speech attributes such as emotion and is capable of generating highly expressive and diverse speech for speakers in the wild. Furthermore, we propose a self-distillation method for speech factorization, as well as a reinforcement learning approach to enhance model robustness, speaker similarity, and controllability. We additionally present a non-autoregressive (NAR) variant of the Seed-TTS model, named $\text{Seed-TTS}_\text{DiT}$, which utilizes a fully diffusion-based architecture. Unlike previous NAR-based TTS systems, $\text{Seed-TTS}_\text{DiT}$ does not depend on pre-estimated phoneme durations and performs speech generation through end-to-end processing. We demonstrate that this variant achieves comparable performance to the language model-based variant and showcase its effectiveness in speech editing. We encourage readers to listen to demos at \url{https://bytedancespeech.github.io/seedtts_tech_report}.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
GrootVL: Tree Topology is All You Need in State Space Model
Authors:
Yicheng Xiao,
Lin Song,
Shaoli Huang,
Jiangshan Wang,
Siyu Song,
Yixiao Ge,
Xiu Li,
Ying Shan
Abstract:
The state space models, employing recursively propagated features, demonstrate strong representation capabilities comparable to Transformer models and superior efficiency. However, constrained by the inherent geometric constraints of sequences, it still falls short in modeling long-range dependencies. To address this issue, we propose the GrootVL network, which first dynamically generates a tree t…
▽ More
The state space models, employing recursively propagated features, demonstrate strong representation capabilities comparable to Transformer models and superior efficiency. However, constrained by the inherent geometric constraints of sequences, it still falls short in modeling long-range dependencies. To address this issue, we propose the GrootVL network, which first dynamically generates a tree topology based on spatial relationships and input features. Then, feature propagation is performed based on this graph, thereby breaking the original sequence constraints to achieve stronger representation capabilities. Additionally, we introduce a linear complexity dynamic programming algorithm to enhance long-range interactions without increasing computational cost. GrootVL is a versatile multimodal framework that can be applied to both visual and textual tasks. Extensive experiments demonstrate that our method significantly outperforms existing structured state space models on image classification, object detection and segmentation. Besides, by fine-tuning large language models, our approach achieves consistent improvements in multiple textual tasks at minor training cost.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
A KL-based Analysis Framework with Applications to Non-Descent Optimization Methods
Authors:
Junwen Qiu,
Bohao Ma,
Xiao Li,
Andre Milzarek
Abstract:
We propose a novel analysis framework for non-descent-type optimization methodologies in nonconvex scenarios based on the Kurdyka-Lojasiewicz property. Our framework allows covering a broad class of algorithms, including those commonly employed in stochastic and distributed optimization. Specifically, it enables the analysis of first-order methods that lack a sufficient descent property and do not…
▽ More
We propose a novel analysis framework for non-descent-type optimization methodologies in nonconvex scenarios based on the Kurdyka-Lojasiewicz property. Our framework allows covering a broad class of algorithms, including those commonly employed in stochastic and distributed optimization. Specifically, it enables the analysis of first-order methods that lack a sufficient descent property and do not require access to full (deterministic) gradient information. We leverage this framework to establish, for the first time, iterate convergence and the corresponding rates for the decentralized gradient method and federated averaging under mild assumptions. Furthermore, based on the new analysis techniques, we show the convergence of the random reshuffling and stochastic gradient descent method without necessitating typical a priori bounded iterates assumptions.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
MaskSR: Masked Language Model for Full-band Speech Restoration
Authors:
Xu Li,
Qirui Wang,
Xiaoyu Liu
Abstract:
Speech restoration aims at restoring high quality speech in the presence of a diverse set of distortions. Although several deep learning paradigms have been studied for this task, the power of the recently emerging language models has not been fully explored. In this paper, we propose MaskSR, a masked language model capable of restoring full-band 44.1 kHz speech jointly considering noise, reverb,…
▽ More
Speech restoration aims at restoring high quality speech in the presence of a diverse set of distortions. Although several deep learning paradigms have been studied for this task, the power of the recently emerging language models has not been fully explored. In this paper, we propose MaskSR, a masked language model capable of restoring full-band 44.1 kHz speech jointly considering noise, reverb, clip**, and low bandwidth. MaskSR works with discrete acoustic tokens extracted using a pre-trained neural codec. During training, MaskSR is optimized to predict randomly masked tokens extracted from the high quality target speech, conditioned on the corrupted speech with various distortions. During inference, MaskSR reconstructs the target speech tokens with efficient iterative sampling. Extensive experiments show that MaskSR obtains competitive results on both the full-band speech restoration task and also on sub-tasks compared with a wide range of models.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Simulation of DAMPE silicon microstrip detectors in the $\rm Allpix^{2}$ framework
Authors:
Yu-Xin Cui,
Xiang Li,
Shen Wang,
Chuan Yue,
Qiang Wan,
Shi-Jun Lei,
Guan-Wen Yuan,
Yi-Ming Hu,
Jia-Ju Wei,
Jian-Hua Guo
Abstract:
Silicon strip detectors have been widely utilized in space experiments for gamma-ray and cosmic-ray detections thanks to their high spatial resolution and stable performance. For a silicon micro-strip detector, the Monte Carlo simulation is recognized as a practical and cost-effective approach to verify the detector performance. In this study, a technique for the simulation of the silicon micro-st…
▽ More
Silicon strip detectors have been widely utilized in space experiments for gamma-ray and cosmic-ray detections thanks to their high spatial resolution and stable performance. For a silicon micro-strip detector, the Monte Carlo simulation is recognized as a practical and cost-effective approach to verify the detector performance. In this study, a technique for the simulation of the silicon micro-strip detector with the $\rm Allpix^{2}$ framework is developed. By incorporating the electric field into the particle transport simulation based on Geant4, this framework could precisely emulate the carrier drift in the silicon micro-strip detector. The simulation results are validated using the beam test data as well as the flight data of the DAMPE experiment, which suggests that the $\rm Allpix^{2}$ framework is a powerful tool to obtain the performance of the silicon micro-strip detector.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
An efficient Wasserstein-distance approach for reconstructing jump-diffusion processes using parameterized neural networks
Authors:
Mingtao Xia,
Xiangting Li,
Qi**g Shen,
Tom Chou
Abstract:
We analyze the Wasserstein distance ($W$-distance) between two probability distributions associated with two multidimensional jump-diffusion processes. Specifically, we analyze a temporally decoupled squared $W_2$-distance, which provides both upper and lower bounds associated with the discrepancies in the drift, diffusion, and jump amplitude functions between the two jump-diffusion processes. The…
▽ More
We analyze the Wasserstein distance ($W$-distance) between two probability distributions associated with two multidimensional jump-diffusion processes. Specifically, we analyze a temporally decoupled squared $W_2$-distance, which provides both upper and lower bounds associated with the discrepancies in the drift, diffusion, and jump amplitude functions between the two jump-diffusion processes. Then, we propose a temporally decoupled squared $W_2$-distance method for efficiently reconstructing unknown jump-diffusion processes from data using parameterized neural networks. We further show its performance can be enhanced by utilizing prior information on the drift function of the jump-diffusion process. The effectiveness of our proposed reconstruction method is demonstrated across several examples and applications.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
DA-HFNet: Progressive Fine-Grained Forgery Image Detection and Localization Based on Dual Attention
Authors:
Yang Liu,
Xiaofei Li,
Jun Zhang,
Shengze Hu,
Jun Lei
Abstract:
The increasing difficulty in accurately detecting forged images generated by AIGC(Artificial Intelligence Generative Content) poses many risks, necessitating the development of effective methods to identify and further locate forged areas. In this paper, to facilitate research efforts, we construct a DA-HFNet forged image dataset guided by text or image-assisted GAN and Diffusion model. Our goal i…
▽ More
The increasing difficulty in accurately detecting forged images generated by AIGC(Artificial Intelligence Generative Content) poses many risks, necessitating the development of effective methods to identify and further locate forged areas. In this paper, to facilitate research efforts, we construct a DA-HFNet forged image dataset guided by text or image-assisted GAN and Diffusion model. Our goal is to utilize a hierarchical progressive network to capture forged artifacts at different scales for detection and localization. Specifically, it relies on a dual-attention mechanism to adaptively fuse multi-modal image features in depth, followed by a multi-branch interaction network to thoroughly interact image features at different scales and improve detector performance by leveraging dependencies between layers. Additionally, we extract more sensitive noise fingerprints to obtain more prominent forged artifact features in the forged areas. Extensive experiments validate the effectiveness of our approach, demonstrating significant performance improvements compared to state-of-the-art methods for forged image detection and localization.The code and dataset will be released in the future.
△ Less
Submitted 4 June, 2024; v1 submitted 3 June, 2024;
originally announced June 2024.
-
ED-SAM: An Efficient Diffusion Sampling Approach to Domain Generalization in Vision-Language Foundation Models
Authors:
Thanh-Dat Truong,
Xin Li,
Bhiksha Raj,
Jackson Cothren,
Khoa Luu
Abstract:
The Vision-Language Foundation Model has recently shown outstanding performance in various perception learning tasks. The outstanding performance of the vision-language model mainly relies on large-scale pre-training datasets and different data augmentation techniques. However, the domain generalization problem of the vision-language foundation model needs to be addressed. This problem has limited…
▽ More
The Vision-Language Foundation Model has recently shown outstanding performance in various perception learning tasks. The outstanding performance of the vision-language model mainly relies on large-scale pre-training datasets and different data augmentation techniques. However, the domain generalization problem of the vision-language foundation model needs to be addressed. This problem has limited the generalizability of the vision-language foundation model to unknown data distributions. In this paper, we introduce a new simple but efficient Diffusion Sampling approach to Domain Generalization (ED-SAM) to improve the generalizability of the vision-language foundation model. Our theoretical analysis in this work reveals the critical role and relation of the diffusion model to domain generalization in the vision-language foundation model. Then, based on the insightful analysis, we introduce a new simple yet effective Transport Transformation to diffusion sampling method. It can effectively generate adversarial samples to improve the generalizability of the foundation model against unknown data distributions. The experimental results on different scales of vision-language pre-training datasets, including CC3M, CC12M, and LAION400M, have consistently shown State-of-the-Art performance and scalability of the proposed ED-SAM approach compared to the other recent methods.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Measurements of the branching fractions of semileptonic $D^{+}_s$ decays via $e^+e^-\to D_s^{*+}D_s^{*-}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (638 additional authors not shown)
Abstract:
We measure the absolute branching fractions of semileptonic $D^+_s$ decays via the $e^+e^-\to D_s^{*+}D_s^{*-}$ process using $e^+e^-$ collision data corresponding to an integrated luminosity of $10.64~\mathrm{fb}^{-1}$ collected by the BESIII detector at center-of-mass energies between 4.237 and 4.699 GeV. The branching fractions are…
▽ More
We measure the absolute branching fractions of semileptonic $D^+_s$ decays via the $e^+e^-\to D_s^{*+}D_s^{*-}$ process using $e^+e^-$ collision data corresponding to an integrated luminosity of $10.64~\mathrm{fb}^{-1}$ collected by the BESIII detector at center-of-mass energies between 4.237 and 4.699 GeV. The branching fractions are ${\mathcal B}(D_s^+\to ηe^+ν_e)=(2.35\pm0.11_{\rm stat}\pm 0.10_{\rm syst})\%,$ ${\mathcal
B}(D_s^+\to η^\prime e^+ν_e)=(0.82\pm0.09_{\rm stat}\pm 0.04_{\rm syst})\%,$ ${\mathcal B}(D_s^+\to φe^+ν_e)=(2.21\pm0.16_{\rm stat}\pm 0.11_{\rm syst})\%,$ ${\mathcal B}(D_s^+\to f_0(980) e^+ν_e,f_0(980)\toπ^+π^-)=(0.15\pm0.02_{\rm stat}\pm 0.01_{\rm syst})\%,$ ${\mathcal
B}(D_s^+\to K^0 e^+ν_e)=(0.24\pm0.04_{\rm stat}\pm 0.01_{\rm syst})\%,$ and ${\mathcal B}(D_s^+\to K^{*0} e^+ν_e)=(0.19\pm0.03_{\rm stat}\pm 0.01_{\rm syst})\%.$ These results are consistent with those measured via the $e^+e^-\to D_s^{*\pm}D_s^{\mp}$ process by BESIII and CLEO. The hadronic transition form factors $D^+_s\to ηe^+ν_e$, $D^+_s\to η^\prime e^+ν_e$, and $D^+_s\to K^0 e^+ν_e$ at four-momentum transfer squared $q^2$ = 0 are determined to be $f^η_+(0) = 0.482 \pm 0.011_{\rm stat} \pm 0.009_{\rm syst}\pm0.004_{\rm input},$ $f^{η^{\prime}}_+(0) = 0.562 \pm 0.031_{\rm stat} \pm 0.014_{\rm
syst}\pm0.003_{\rm input},$ and $f^{K^0}_+(0) = 0.624 \pm 0.052_{\rm
stat} \pm 0.013_{\rm syst}\pm0.002_{\rm input}.$
△ Less
Submitted 4 June, 2024; v1 submitted 3 June, 2024;
originally announced June 2024.
-
Symmetry enforced solution of the many-body Schrödinger equation with deep neural network
Authors:
Zhe Li,
Zixiang Lu,
Ruichen Li,
Xuelan Wen,
Xiang Li,
Liwei Wang,
Ji Chen,
Weiluo Ren
Abstract:
The integration of deep neural networks with the Variational Monte Carlo (VMC) method has marked a significant advancement in solving the Schrödinger equation. In this work, we enforce spin symmetry in the neural network-based VMC calculation with modified optimization target. Our method is designed to solve for the ground state and multiple excited states with target spin symmetry at a low comput…
▽ More
The integration of deep neural networks with the Variational Monte Carlo (VMC) method has marked a significant advancement in solving the Schrödinger equation. In this work, we enforce spin symmetry in the neural network-based VMC calculation with modified optimization target. Our method is designed to solve for the ground state and multiple excited states with target spin symmetry at a low computational cost. It predicts accurate energies while maintaining the correct symmetry in strongly correlated systems, even in cases where different spin states are nearly degenerate. Our approach also excels at spin-gap calculations, including the singlet-triplet gap in biradical systems, which is of high interest in photochemistry. Overall, this work establishes a robust framework for efficiently calculating various quantum states with specific spin symmetry in correlated systems, paving the way for novel discoveries in quantum science.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
BACON: Bayesian Optimal Condensation Framework for Dataset Distillation
Authors:
Zheng Zhou,
Hongbo Zhao,
Guangliang Cheng,
Xiangtai Li,
Shuchang Lyu,
Wenquan Feng,
Qi Zhao
Abstract:
Dataset Distillation (DD) aims to distill knowledge from extensive datasets into more compact ones while preserving performance on the test set, thereby reducing storage costs and training expenses. However, existing methods often suffer from computational intensity, particularly exhibiting suboptimal performance with large dataset sizes due to the lack of a robust theoretical framework for analyz…
▽ More
Dataset Distillation (DD) aims to distill knowledge from extensive datasets into more compact ones while preserving performance on the test set, thereby reducing storage costs and training expenses. However, existing methods often suffer from computational intensity, particularly exhibiting suboptimal performance with large dataset sizes due to the lack of a robust theoretical framework for analyzing the DD problem. To address these challenges, we propose the BAyesian optimal CONdensation framework (BACON), which is the first work to introduce the Bayesian theoretical framework to the literature of DD. This framework provides theoretical support for enhancing the performance of DD. Furthermore, BACON formulates the DD problem as the minimization of the expected risk function in joint probability distributions using the Bayesian framework. Additionally, by analyzing the expected risk function for optimal condensation, we derive a numerically feasible lower bound based on specific assumptions, providing an approximate solution for BACON. We validate BACON across several datasets, demonstrating its superior performance compared to existing state-of-the-art methods. For instance, under the IPC-10 setting, BACON achieves a 3.46% accuracy gain over the IDM method on the CIFAR-10 dataset and a 3.10% gain on the TinyImageNet dataset. Our extensive experiments confirm the effectiveness of BACON and its seamless integration with existing methods, thereby enhancing their performance for the DD task. Code and distilled datasets are available at BACON.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
UniQA: Unified Vision-Language Pre-training for Image Quality and Aesthetic Assessment
Authors:
Hantao Zhou,
Longxiang Tang,
Rui Yang,
Guanyi Qin,
Yan Zhang,
Runze Hu,
Xiu Li
Abstract:
Image Quality Assessment (IQA) and Image Aesthetic Assessment (IAA) aim to simulate human subjective perception of image visual quality and aesthetic appeal. Existing methods typically address these tasks independently due to distinct learning objectives. However, they neglect the underlying interconnectedness of both tasks, which hinders the learning of task-agnostic shared representations for hu…
▽ More
Image Quality Assessment (IQA) and Image Aesthetic Assessment (IAA) aim to simulate human subjective perception of image visual quality and aesthetic appeal. Existing methods typically address these tasks independently due to distinct learning objectives. However, they neglect the underlying interconnectedness of both tasks, which hinders the learning of task-agnostic shared representations for human subjective perception. To confront this challenge, we propose Unified vision-language pre-training of Quality and Aesthetics (UniQA), to learn general perceptions of two tasks, thereby benefiting them simultaneously. Addressing the absence of text in the IQA datasets and the presence of textual noise in the IAA datasets, (1) we utilize multimodal large language models (MLLMs) to generate high-quality text descriptions; (2) the generated text for IAA serves as metadata to purify noisy IAA data. To effectively adapt the pre-trained UniQA to downstream tasks, we further propose a lightweight adapter that utilizes versatile cues to fully exploit the extensive knowledge of the pre-trained model. Extensive experiments demonstrate that our approach attains a new state-of-the-art performance on both IQA and IAA tasks, while concurrently showcasing exceptional zero-shot and few-label image assessment capabilities. The source code will be available at https://github.com/zht8506/UniQA.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Causal prompting model-based offline reinforcement learning
Authors:
Xuehui Yu,
Yi Guan,
Rujia Shen,
Xin Li,
Chen Tang,
**gchi Jiang
Abstract:
Model-based offline Reinforcement Learning (RL) allows agents to fully utilise pre-collected datasets without requiring additional or unethical explorations. However, applying model-based offline RL to online systems presents challenges, primarily due to the highly suboptimal (noise-filled) and diverse nature of datasets generated by online systems. To tackle these issues, we introduce the Causal…
▽ More
Model-based offline Reinforcement Learning (RL) allows agents to fully utilise pre-collected datasets without requiring additional or unethical explorations. However, applying model-based offline RL to online systems presents challenges, primarily due to the highly suboptimal (noise-filled) and diverse nature of datasets generated by online systems. To tackle these issues, we introduce the Causal Prompting Reinforcement Learning (CPRL) framework, designed for highly suboptimal and resource-constrained online scenarios. The initial phase of CPRL involves the introduction of the Hidden-Parameter Block Causal Prompting Dynamic (Hip-BCPD) to model environmental dynamics. This approach utilises invariant causal prompts and aligns hidden parameters to generalise to new and diverse online users. In the subsequent phase, a single policy is trained to address multiple tasks through the amalgamation of reusable skills, circumventing the need for training from scratch. Experiments conducted across datasets with varying levels of noise, including simulation-based and real-world offline datasets from the Dnurse APP, demonstrate that our proposed method can make robust decisions in out-of-distribution and noisy environments, outperforming contemporary algorithms. Additionally, we separately verify the contributions of Hip-BCPDs and the skill-reuse strategy to the robustness of performance. We further analyse the visualised structure of Hip-BCPD and the interpretability of sub-skills. We released our source code and the first ever real-world medical dataset for precise medical decision-making tasks.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
CYCLO: Cyclic Graph Transformer Approach to Multi-Object Relationship Modeling in Aerial Videos
Authors:
Trong-Thuan Nguyen,
Pha Nguyen,
Xin Li,
Jackson Cothren,
Alper Yilmaz,
Khoa Luu
Abstract:
Video scene graph generation (VidSGG) has emerged as a transformative approach to capturing and interpreting the intricate relationships among objects and their temporal dynamics in video sequences. In this paper, we introduce the new AeroEye dataset that focuses on multi-object relationship modeling in aerial videos. Our AeroEye dataset features various drone scenes and includes a visually compre…
▽ More
Video scene graph generation (VidSGG) has emerged as a transformative approach to capturing and interpreting the intricate relationships among objects and their temporal dynamics in video sequences. In this paper, we introduce the new AeroEye dataset that focuses on multi-object relationship modeling in aerial videos. Our AeroEye dataset features various drone scenes and includes a visually comprehensive and precise collection of predicates that capture the intricate relationships and spatial arrangements among objects. To this end, we propose the novel Cyclic Graph Transformer (CYCLO) approach that allows the model to capture both direct and long-range temporal dependencies by continuously updating the history of interactions in a circular manner. The proposed approach also allows one to handle sequences with inherent cyclical patterns and process object relationships in the correct sequential order. Therefore, it can effectively capture periodic and overlap** relationships while minimizing information loss. The extensive experiments on the AeroEye dataset demonstrate the effectiveness of the proposed CYCLO model, demonstrating its potential to perform scene understanding on drone videos. Finally, the CYCLO method consistently achieves State-of-the-Art (SOTA) results on two in-the-wild scene graph generation benchmarks, i.e., PVSG and ASPIRe.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Scalable Ensembling For Mitigating Reward Overoptimisation
Authors:
Ahmed M. Ahmed,
Rafael Rafailov,
Stepan Sharkov,
Xuechen Li,
Sanmi Koyejo
Abstract:
Reinforcement Learning from Human Feedback (RLHF) has enabled significant advancements within language modeling for powerful, instruction-following models. However, the alignment of these models remains a pressing challenge as the policy tends to overfit the learned ``proxy" reward model past an inflection point of utility as measured by a ``gold" reward model that is more performant -- a phenomen…
▽ More
Reinforcement Learning from Human Feedback (RLHF) has enabled significant advancements within language modeling for powerful, instruction-following models. However, the alignment of these models remains a pressing challenge as the policy tends to overfit the learned ``proxy" reward model past an inflection point of utility as measured by a ``gold" reward model that is more performant -- a phenomenon known as overoptimisation. Prior work has mitigated this issue by computing a pessimistic statistic over an ensemble of reward models, which is common in Offline Reinforcement Learning but incredibly costly for language models with high memory requirements, making such approaches infeasible for sufficiently large models. To this end, we propose using a shared encoder but separate linear heads. We find this leads to similar performance as the full ensemble while allowing tremendous savings in memory and time required for training for models of similar size.
△ Less
Submitted 18 June, 2024; v1 submitted 3 June, 2024;
originally announced June 2024.
-
Measurement of Electron Antineutrino Oscillation Amplitude and Frequency via Neutron Capture on Hydrogen at Daya Bay
Authors:
Daya Bay collaboration,
F. P. An,
W. D. Bai,
A. B. Balantekin,
M. Bishai,
S. Blyth,
G. F. Cao,
J. Cao,
J. F. Chang,
Y. Chang,
H. S. Chen,
H. Y. Chen,
S. M. Chen,
Y. Chen,
Y. X. Chen,
Z. Y. Chen,
J. Cheng,
J. Cheng,
Y. -C. Cheng,
Z. K. Cheng,
J. J. Cherwinka,
M. C. Chu,
J. P. Cummings,
O. Dalager,
F. S. Deng
, et al. (177 additional authors not shown)
Abstract:
This Letter reports the first measurement of the oscillation amplitude and frequency of reactor antineutrinos at Daya Bay via neutron capture on hydrogen using 1958 days of data. With over 3.6 million signal candidates, an optimized candidate selection, improved treatment of backgrounds and efficiencies, refined energy calibration, and an energy response model for the capture-on-hydrogen sensitive…
▽ More
This Letter reports the first measurement of the oscillation amplitude and frequency of reactor antineutrinos at Daya Bay via neutron capture on hydrogen using 1958 days of data. With over 3.6 million signal candidates, an optimized candidate selection, improved treatment of backgrounds and efficiencies, refined energy calibration, and an energy response model for the capture-on-hydrogen sensitive region, the relative $\overlineν_{e}$ rates and energy spectra variation among the near and far detectors gives $\mathrm{sin}^22θ_{13} = 0.0759_{-0.0049}^{+0.0050}$ and $Δm^2_{32} = (2.72^{+0.14}_{-0.15})\times10^{-3}$ eV$^2$ assuming the normal neutrino mass ordering, and $Δm^2_{32} = (-2.83^{+0.15}_{-0.14})\times10^{-3}$ eV$^2$ for the inverted neutrino mass ordering. This estimate of $\sin^2 2θ_{13}$ is consistent with and essentially independent from the one obtained using the capture-on-gadolinium sample at Daya Bay. The combination of these two results yields $\mathrm{sin}^22θ_{13}= 0.0833\pm0.0022$, which represents an 8% relative improvement in precision regarding the Daya Bay full 3158-day capture-on-gadolinium result.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Detection of Acetone as a Gas Biomarker for Diabetes Based on Gas Sensor Technology
Authors:
Jiaming Wei,
Tong Liu,
Jipeng Huang,
Xiaowei Li,
Yurui Qi,
Gangyin Luo
Abstract:
With the continuous development and improvement of medical services, there is a growing demand for improving diabetes diagnosis. Exhaled breath analysis, characterized by its speed, convenience, and non-invasive nature, is leading the trend in diagnostic development. Studies have shown that the acetone levels in the breath of diabetes patients are higher than normal, making acetone a basis for dia…
▽ More
With the continuous development and improvement of medical services, there is a growing demand for improving diabetes diagnosis. Exhaled breath analysis, characterized by its speed, convenience, and non-invasive nature, is leading the trend in diagnostic development. Studies have shown that the acetone levels in the breath of diabetes patients are higher than normal, making acetone a basis for diabetes breath analysis. This provides a more readily accepted method for early diabetes prevention and monitoring. Addressing issues such as the invasive nature, disease transmission risks, and complexity of diabetes testing, this study aims to design a diabetes gas biomarker acetone detection system centered around a sensor array using gas sensors and pattern recognition algorithms. The research covers sensor selection, sensor preparation, circuit design, data acquisition and processing, and detection model establishment to accurately identify acetone. Titanium dioxide was chosen as the nano gas-sensitive material to prepare the acetone gas sensor, with data collection conducted using STM32. Filtering was applied to process the raw sensor data, followed by feature extraction using principal component analysis. A recognition model based on support vector machine algorithm was used for qualitative identification of gas samples, while a recognition model based on backpropagation neural network was employed for quantitative detection of gas sample concentrations. Experimental results demonstrated recognition accuracies of 96% and 97.5% for acetone-ethanol and acetone-methanol mixed gases, and 90% for ternary acetone, ethanol, and methanol mixed gases.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
All-sky Guide Star Catalog for CSST
Authors:
Hui-Mei Feng,
Zi-Huang Cao,
Man I Lam,
Ran Li,
Hao Tian,
Da-Yi Yin,
Yuan-Yu Yang,
Xin Zhang,
Dong-Wei Fan,
Yi-Qiao Dong,
Xin-Feng Li,
Wei Wang,
Long Li,
Hugh R. A. Jones,
Yi-Han Tao,
Jia-Lu Nie,
Pei-Pei Wang,
Mao-Yuan Liu,
He-jun Yang,
Chao Liu
Abstract:
The China Space Station Telescope (CSST) is a two-meter space telescope with multiple back-end instruments. The Fine Guidance Sensor (FGS) is an essential subsystem of the CSST Precision Image Stability System to ensure the required absolute pointing accuracy and line-of-sight stabilization. In this study, we construct the Main Guide Star Catalog for FGS. To accomplish this, we utilize the informa…
▽ More
The China Space Station Telescope (CSST) is a two-meter space telescope with multiple back-end instruments. The Fine Guidance Sensor (FGS) is an essential subsystem of the CSST Precision Image Stability System to ensure the required absolute pointing accuracy and line-of-sight stabilization. In this study, we construct the Main Guide Star Catalog for FGS. To accomplish this, we utilize the information about the FGS and object information from the Gaia Data Release 3. We provide an FGS instrument magnitude and exclude variables, binaries, and high proper motion stars from the catalog to ensure uniform FGS guidance capabilities. Subsequently, we generate a HEALPix index, which provides a hierarchical tessellation of the celestial sphere, and employ the Voronoi algorithm to achieve a homogeneous distribution of stars across the catalog. This distribution ensures adequate coverage and sampling of the sky. The performance of the CSST guide star catalog was assessed by simulating the field of view of the FGS according to the CSST mock survey strategy catalog. The analysis of the results indicates that this catalog provides adequate coverage and accuracy. The catalog's performance meets the FGS requirements, ensuring the functioning of the FGS and its guidance capabilities.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Assessing the Adversarial Security of Perceptual Hashing Algorithms
Authors:
Jordan Madden,
Moxanki Bhavsar,
Lhamo Dorje,
Xiaohua Li
Abstract:
Perceptual hashing algorithms (PHAs) are utilized extensively for identifying illegal online content. Given their crucial role in sensitive applications, understanding their security strengths and weaknesses is critical. This paper compares three major PHAs deployed widely in practice: PhotoDNA, PDQ, and NeuralHash, and assesses their robustness against three typical attacks: normal image editing…
▽ More
Perceptual hashing algorithms (PHAs) are utilized extensively for identifying illegal online content. Given their crucial role in sensitive applications, understanding their security strengths and weaknesses is critical. This paper compares three major PHAs deployed widely in practice: PhotoDNA, PDQ, and NeuralHash, and assesses their robustness against three typical attacks: normal image editing attacks, malicious adversarial attacks, and hash inversion attacks. Contrary to prevailing studies, this paper reveals that these PHAs exhibit resilience to black-box adversarial attacks when realistic constraints regarding the distortion and query budget are applied, attributed to the unique property of random hash variations. Moreover, this paper illustrates that original images can be reconstructed from the hash bits, raising significant privacy concerns. By comprehensively exposing their security vulnerabilities, this paper contributes to the ongoing efforts aimed at enhancing the security of PHAs for effective deployment.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
YODAS: Youtube-Oriented Dataset for Audio and Speech
Authors:
Xinjian Li,
Shinnosuke Takamichi,
Takaaki Saeki,
William Chen,
Sayaka Shiota,
Shinji Watanabe
Abstract:
In this study, we introduce YODAS (YouTube-Oriented Dataset for Audio and Speech), a large-scale, multilingual dataset comprising currently over 500k hours of speech data in more than 100 languages, sourced from both labeled and unlabeled YouTube speech datasets. The labeled subsets, including manual or automatic subtitles, facilitate supervised model training. Conversely, the unlabeled subsets ar…
▽ More
In this study, we introduce YODAS (YouTube-Oriented Dataset for Audio and Speech), a large-scale, multilingual dataset comprising currently over 500k hours of speech data in more than 100 languages, sourced from both labeled and unlabeled YouTube speech datasets. The labeled subsets, including manual or automatic subtitles, facilitate supervised model training. Conversely, the unlabeled subsets are apt for self-supervised learning applications. YODAS is distinctive as the first publicly available dataset of its scale, and it is distributed under a Creative Commons license. We introduce the collection methodology utilized for YODAS, which contributes to the large-scale speech dataset construction. Subsequently, we provide a comprehensive analysis of speech, text contained within the dataset. Finally, we describe the speech recognition baselines over the top-15 languages.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
MagR: Weight Magnitude Reduction for Enhancing Post-Training Quantization
Authors:
Aozhong Zhang,
Naigang Wang,
Yanxia Deng,
Xin Li,
Zi Yang,
Penghang Yin
Abstract:
In this paper, we present a simple optimization-based preprocessing technique called Weight Magnitude Reduction (MagR) to improve the performance of post-training quantization. For each linear layer, we adjust the pre-trained floating-point weights by solving an $\ell_\infty$-regularized optimization problem. This process greatly diminishes the maximum magnitude of the weights and smooths out outl…
▽ More
In this paper, we present a simple optimization-based preprocessing technique called Weight Magnitude Reduction (MagR) to improve the performance of post-training quantization. For each linear layer, we adjust the pre-trained floating-point weights by solving an $\ell_\infty$-regularized optimization problem. This process greatly diminishes the maximum magnitude of the weights and smooths out outliers, while preserving the layer's output. The preprocessed weights are centered more towards zero, which facilitates the subsequent quantization process. To implement MagR, we address the $\ell_\infty$-regularization by employing an efficient proximal gradient descent algorithm. Unlike existing preprocessing methods that involve linear transformations and subsequent post-processing steps, which can introduce significant overhead at inference time, MagR functions as a non-linear transformation, eliminating the need for any additional post-processing. This ensures that MagR introduces no overhead whatsoever during inference. Our experiments demonstrate that MagR achieves state-of-the-art performance on the Llama family of models. For example, we achieve a Wikitext2 perplexity of 5.95 on the LLaMA2-70B model for per-channel INT2 weight quantization without incurring any inference overhead.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
Evaluating Mathematical Reasoning of Large Language Models: A Focus on Error Identification and Correction
Authors:
Xiaoyuan Li,
Wenjie Wang,
Moxin Li,
Junrong Guo,
Yang Zhang,
Fuli Feng
Abstract:
The rapid advancement of Large Language Models (LLMs) in the realm of mathematical reasoning necessitates comprehensive evaluations to gauge progress and inspire future directions. Existing assessments predominantly focus on problem-solving from the examinee perspective, overlooking a dual perspective of examiner regarding error identification and correction. From the examiner perspective, we defi…
▽ More
The rapid advancement of Large Language Models (LLMs) in the realm of mathematical reasoning necessitates comprehensive evaluations to gauge progress and inspire future directions. Existing assessments predominantly focus on problem-solving from the examinee perspective, overlooking a dual perspective of examiner regarding error identification and correction. From the examiner perspective, we define four evaluation tasks for error identification and correction along with a new dataset with annotated error types and steps. We also design diverse prompts to thoroughly evaluate eleven representative LLMs. Our principal findings indicate that GPT-4 outperforms all models, while open-source model LLaMA-2-7B demonstrates comparable abilities to closed-source models GPT-3.5 and Gemini Pro. Notably, calculation error proves the most challenging error type. Moreover, prompting LLMs with the error types can improve the average correction accuracy by 47.9\%. These results reveal potential directions for develo** the mathematical reasoning abilities of LLMs. Our code and dataset is available on https://github.com/LittleCirc1e/EIC.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
Artificial General Intelligence (AGI) for the oil and gas industry: a review
Authors:
Jimmy Xuekai Li,
Tiancheng Zhang,
Yiran Zhu,
Zhongwei Chen
Abstract:
Artificial General Intelligence (AGI) is set to profoundly impact the oil and gas industry by introducing unprecedented efficiencies and innovations. This paper explores AGI's foundational principles and its transformative applications, particularly focusing on the advancements brought about by large language models (LLMs) and extensive computer vision systems in the upstream sectors of the indust…
▽ More
Artificial General Intelligence (AGI) is set to profoundly impact the oil and gas industry by introducing unprecedented efficiencies and innovations. This paper explores AGI's foundational principles and its transformative applications, particularly focusing on the advancements brought about by large language models (LLMs) and extensive computer vision systems in the upstream sectors of the industry. The integration of Artificial Intelligence (AI) has already begun resha** the oil and gas landscape, offering enhancements in production optimization, downtime reduction, safety improvements, and advancements in exploration and drilling techniques. These technologies streamline logistics, minimize maintenance costs, automate monotonous tasks, refine decision-making processes, foster team collaboration, and amplify profitability through error reduction and actionable insights extraction. Despite these advancements, the deployment of AI technologies faces challenges, including the necessity for skilled professionals for implementation and the limitations of model training on constrained datasets, which affects the models' adaptability across different contexts. The advent of generative AI, exemplified by innovations like ChatGPT and the Segment Anything Model (SAM), heralds a new era of high-density innovation. These developments highlight a shift towards natural language interfaces and domain-knowledge-driven AI, promising more accessible and tailored solutions for the oil and gas industry. This review articulates the vast potential AGI holds for tackling complex operational challenges within the upstream oil and gas industry, requiring near-human levels of intelligence. We discussed the promising applications, the hurdles of large-scale AGI model deployment, and the necessity for domain-specific knowledge in maximizing the benefits of these technologies.
△ Less
Submitted 11 June, 2024; v1 submitted 1 June, 2024;
originally announced June 2024.
-
Research on the Application of Computer Vision Based on Deep Learning in Autonomous Driving Technology
Authors:
**gyu Zhang,
** Cao,
**ghao Chang,
Xin** Li,
Houze Liu,
Zhenglin Li
Abstract:
This research aims to explore the application of deep learning in autonomous driving computer vision technology and its impact on improving system performance. By using advanced technologies such as convolutional neural networks (CNN), multi-task joint learning methods, and deep reinforcement learning, this article analyzes in detail the application of deep learning in image recognition, real-time…
▽ More
This research aims to explore the application of deep learning in autonomous driving computer vision technology and its impact on improving system performance. By using advanced technologies such as convolutional neural networks (CNN), multi-task joint learning methods, and deep reinforcement learning, this article analyzes in detail the application of deep learning in image recognition, real-time target tracking and classification, environment perception and decision support, and path planning and navigation. Application process in key areas. Research results show that the proposed system has an accuracy of over 98% in image recognition, target tracking and classification, and also demonstrates efficient performance and practicality in environmental perception and decision support, path planning and navigation. The conclusion points out that deep learning technology can significantly improve the accuracy and real-time response capabilities of autonomous driving systems. Although there are still challenges in environmental perception and decision support, with the advancement of technology, it is expected to achieve wider applications and greater capabilities in the future. potential.
△ Less
Submitted 3 June, 2024; v1 submitted 1 June, 2024;
originally announced June 2024.
-
Federated Model Heterogeneous Matryoshka Representation Learning
Authors:
Li** Yi,
Han Yu,
Chao Ren,
Gang Wang,
Xiaoguang Liu,
Xiaoxiao Li
Abstract:
Model heterogeneous federated learning (MHeteroFL) enables FL clients to collaboratively train models with heterogeneous structures in a distributed fashion. However, existing MHeteroFL methods rely on training loss to transfer knowledge between the client model and the server model, resulting in limited knowledge exchange. To address this limitation, we propose the Federated model heterogeneous M…
▽ More
Model heterogeneous federated learning (MHeteroFL) enables FL clients to collaboratively train models with heterogeneous structures in a distributed fashion. However, existing MHeteroFL methods rely on training loss to transfer knowledge between the client model and the server model, resulting in limited knowledge exchange. To address this limitation, we propose the Federated model heterogeneous Matryoshka Representation Learning (FedMRL) approach for supervised learning tasks. It adds an auxiliary small homogeneous model shared by clients with heterogeneous local models. (1) The generalized and personalized representations extracted by the two models' feature extractors are fused by a personalized lightweight representation projector. This step enables representation fusion to adapt to local data distribution. (2) The fused representation is then used to construct Matryoshka representations with multi-dimensional and multi-granular embedded representations learned by the global homogeneous model header and the local heterogeneous model header. This step facilitates multi-perspective representation learning and improves model learning capability. Theoretical analysis shows that FedMRL achieves a $O(1/T)$ non-convex convergence rate. Extensive experiments on benchmark datasets demonstrate its superior model accuracy with low communication and computational costs compared to seven state-of-the-art baselines. It achieves up to 8.48% and 24.94% accuracy improvement compared with the state-of-the-art and the best same-category baseline, respectively.
△ Less
Submitted 1 June, 2024;
originally announced June 2024.
-
Topo4D: Topology-Preserving Gaussian Splatting for High-Fidelity 4D Head Capture
Authors:
Xuanchen Li,
Yuhao Cheng,
Xingyu Ren,
Haozhe Jia,
Di Xu,
Wenhan Zhu,
Yichao Yan
Abstract:
4D head capture aims to generate dynamic topological meshes and corresponding texture maps from videos, which is widely utilized in movies and games for its ability to simulate facial muscle movements and recover dynamic textures in pore-squeezing. The industry often adopts the method involving multi-view stereo and non-rigid alignment. However, this approach is prone to errors and heavily reliant…
▽ More
4D head capture aims to generate dynamic topological meshes and corresponding texture maps from videos, which is widely utilized in movies and games for its ability to simulate facial muscle movements and recover dynamic textures in pore-squeezing. The industry often adopts the method involving multi-view stereo and non-rigid alignment. However, this approach is prone to errors and heavily reliant on time-consuming manual processing by artists. To simplify this process, we propose Topo4D, a novel framework for automatic geometry and texture generation, which optimizes densely aligned 4D heads and 8K texture maps directly from calibrated multi-view time-series images. Specifically, we first represent the time-series faces as a set of dynamic 3D Gaussians with fixed topology in which the Gaussian centers are bound to the mesh vertices. Afterward, we perform alternative geometry and texture optimization frame-by-frame for high-quality geometry and texture learning while maintaining temporal topology stability. Finally, we can extract dynamic facial meshes in regular wiring arrangement and high-fidelity textures with pore-level details from the learned Gaussians. Extensive experiments show that our method achieves superior results than the current SOTA face reconstruction methods both in the quality of meshes and textures. Project page: https://xuanchenli.github.io/Topo4D/.
△ Less
Submitted 1 July, 2024; v1 submitted 1 June, 2024;
originally announced June 2024.
-
Learning Manipulation by Predicting Interaction
Authors:
Jia Zeng,
Qingwen Bu,
Bangjun Wang,
Wenke Xia,
Li Chen,
Hao Dong,
Haoming Song,
Dong Wang,
Di Hu,
** Luo,
Heming Cui,
Bin Zhao,
Xuelong Li,
Yu Qiao,
Hongyang Li
Abstract:
Representation learning approaches for robotic manipulation have boomed in recent years. Due to the scarcity of in-domain robot data, prevailing methodologies tend to leverage large-scale human video datasets to extract generalizable features for visuomotor policy learning. Despite the progress achieved, prior endeavors disregard the interactive dynamics that capture behavior patterns and physical…
▽ More
Representation learning approaches for robotic manipulation have boomed in recent years. Due to the scarcity of in-domain robot data, prevailing methodologies tend to leverage large-scale human video datasets to extract generalizable features for visuomotor policy learning. Despite the progress achieved, prior endeavors disregard the interactive dynamics that capture behavior patterns and physical interaction during the manipulation process, resulting in an inadequate understanding of the relationship between objects and the environment. To this end, we propose a general pre-training pipeline that learns Manipulation by Predicting the Interaction (MPI) and enhances the visual representation.Given a pair of keyframes representing the initial and final states, along with language instructions, our algorithm predicts the transition frame and detects the interaction object, respectively. These two learning objectives achieve superior comprehension towards "how-to-interact" and "where-to-interact". We conduct a comprehensive evaluation of several challenging robotic tasks.The experimental results demonstrate that MPI exhibits remarkable improvement by 10% to 64% compared with previous state-of-the-art in real-world robot platforms as well as simulation environments. Code and checkpoints are publicly shared at https://github.com/OpenDriveLab/MPI.
△ Less
Submitted 1 June, 2024;
originally announced June 2024.
-
Contrastive Learning Via Equivariant Representation
Authors:
Sifan Song,
**feng Wang,
Qiaochu Zhao,
Xiang Li,
Dufan Wu,
Angelos Stefanidis,
Jionglong Su,
S. Kevin Zhou,
Quanzheng Li
Abstract:
Invariant-based Contrastive Learning (ICL) methods have achieved impressive performance across various domains. However, the absence of latent space representation for distortion (augmentation)-related information in the latent space makes ICL sub-optimal regarding training efficiency and robustness in downstream tasks. Recent studies suggest that introducing equivariance into Contrastive Learning…
▽ More
Invariant-based Contrastive Learning (ICL) methods have achieved impressive performance across various domains. However, the absence of latent space representation for distortion (augmentation)-related information in the latent space makes ICL sub-optimal regarding training efficiency and robustness in downstream tasks. Recent studies suggest that introducing equivariance into Contrastive Learning (CL) can improve overall performance. In this paper, we rethink the roles of augmentation strategies and equivariance in improving CL efficacy. We propose a novel Equivariant-based Contrastive Learning (ECL) framework, CLeVER (Contrastive Learning Via Equivariant Representation), compatible with augmentation strategies of arbitrary complexity for various mainstream CL methods and model frameworks. Experimental results demonstrate that CLeVER effectively extracts and incorporates equivariant information from data, thereby improving the training efficiency and robustness of baseline models in downstream tasks.
△ Less
Submitted 31 May, 2024;
originally announced June 2024.
-
Energetic Electrons Accelerated and Trapped in a Magnetic Bottle above a Solar Flare Arcade
Authors:
Bin Chen,
Xiangliang Kong,
Sijie Yu,
Chengcai Shen,
Xiaocan Li,
Fan Guo,
Yixian Zhang,
Lindsay Glesener,
Säm Krucker
Abstract:
Where and how flares efficiently accelerate charged particles remains an unresolved question. Recent studies revealed that a "magnetic bottle" structure, which forms near the bottom of a large-scale reconnection current sheet above the flare arcade, is an excellent candidate for confining and accelerating charged particles. However, further understanding its role requires linking the various obser…
▽ More
Where and how flares efficiently accelerate charged particles remains an unresolved question. Recent studies revealed that a "magnetic bottle" structure, which forms near the bottom of a large-scale reconnection current sheet above the flare arcade, is an excellent candidate for confining and accelerating charged particles. However, further understanding its role requires linking the various observational signatures to the underlying coupled plasma and particle processes. Here we present the first study combining multi-wavelength observations with data-informed macroscopic magnetohydrodynamics and particle modeling in a realistic eruptive flare geometry. The presence of an above-the-looptop magnetic bottle structure is strongly supported by the observations, which feature not only a local minimum of magnetic field strength but also abruptly slowing down plasma downflows. It also coincides with a compact hard X-ray source and an extended microwave source that bestrides above the flare arcade. Spatially resolved spectral analysis suggests that nonthermal electrons are highly concentrated in this region. Our model returns synthetic emission signatures that are well-matched to the observations. The results suggest that the energetic electrons are strongly trapped in the magnetic bottle region due to turbulence, with only a small fraction managing to escape. The electrons are primarily accelerated by plasma compression and facilitated by a fast-mode termination shock via the Fermi mechanism. Our results provide concrete support for the magnetic bottle as the primary electron acceleration site in eruptive solar flares. They also offer new insights into understanding the previously reported small population of flare-accelerated electrons entering interplanetary space.
△ Less
Submitted 31 May, 2024;
originally announced June 2024.
-
Paths of A Million People: Extracting Life Trajectories from Wikipedia
Authors:
Ying Zhang,
Xiaofeng Li,
Zhaoyang Liu,
Haipeng Zhang
Abstract:
Notable people's life trajectories have been a focus of study -- the locations and times of various activities, such as birth, death, education, marriage, competition, work, delivering a speech, making a scientific discovery, finishing a masterpiece, and fighting a battle, and how these people interact with others, carry important messages for the broad research related to human dynamics. However,…
▽ More
Notable people's life trajectories have been a focus of study -- the locations and times of various activities, such as birth, death, education, marriage, competition, work, delivering a speech, making a scientific discovery, finishing a masterpiece, and fighting a battle, and how these people interact with others, carry important messages for the broad research related to human dynamics. However, the scarcity of trajectory data in terms of volume, density, and inter-person interactions, limits relevant studies from being comprehensive and interactive. We mine millions of biography pages from Wikipedia and tackle the generalization problem stemming from the variety and heterogeneity of the trajectory descriptions. Our ensemble model COSMOS, which combines the idea of semi-supervised learning and contrastive learning, achieves an F1 score of 85.95%. For this task, we also create a hand-curated dataset, WikiLifeTrajectory, consisting of 8,852 (person, time, location) triplets as ground truth. Besides, we perform an empirical analysis on the trajectories of 8,272 historians to demonstrate the validity of the extracted results. To facilitate the research on trajectory extractions and help the analytical studies to construct grand narratives, we make our code, the million-level extracted trajectories, and the WikiLifeTrajectory dataset publicly available.
△ Less
Submitted 25 May, 2024;
originally announced June 2024.
-
PTA: Enhancing Multimodal Sentiment Analysis through Pipelined Prediction and Translation-based Alignment
Authors:
Shezheng Song,
Shasha Li,
Shan Zhao,
Chengyu Wang,
Xiaopeng Li,
Jie Yu,
Qian Wan,
Jun Ma,
Tianwei Yan,
Wentao Ma,
Xiaoguang Mao
Abstract:
Multimodal aspect-based sentiment analysis (MABSA) aims to understand opinions in a granular manner, advancing human-computer interaction and other fields. Traditionally, MABSA methods use a joint prediction approach to identify aspects and sentiments simultaneously. However, we argue that joint models are not always superior. Our analysis shows that joint models struggle to align relevant text to…
▽ More
Multimodal aspect-based sentiment analysis (MABSA) aims to understand opinions in a granular manner, advancing human-computer interaction and other fields. Traditionally, MABSA methods use a joint prediction approach to identify aspects and sentiments simultaneously. However, we argue that joint models are not always superior. Our analysis shows that joint models struggle to align relevant text tokens with image patches, leading to misalignment and ineffective image utilization.
In contrast, a pipeline framework first identifies aspects through MATE (Multimodal Aspect Term Extraction) and then aligns these aspects with image patches for sentiment classification (MASC: Multimodal Aspect-Oriented Sentiment Classification). This method is better suited for multimodal scenarios where effective image use is crucial. We present three key observations: (a) MATE and MASC have different feature requirements, with MATE focusing on token-level features and MASC on sequence-level features; (b) the aspect identified by MATE is crucial for effective image utilization; and (c) images play a trivial role in previous MABSA methods due to high noise.
Based on these observations, we propose a pipeline framework that first predicts the aspect and then uses translation-based alignment (TBA) to enhance multimodal semantic consistency for better image utilization. Our method achieves state-of-the-art (SOTA) performance on widely used MABSA datasets Twitter-15 and Twitter-17. This demonstrates the effectiveness of the pipeline approach and its potential to provide valuable insights for future MABSA research.
For reproducibility, the code and checkpoint will be released.
△ Less
Submitted 13 June, 2024; v1 submitted 22 May, 2024;
originally announced June 2024.
-
ULTra-AV: A Unified Longitudinal Trajectory Dataset for Automated Vehicle
Authors:
Hang Zhou,
Ke Ma,
Shixiao Liang,
Xiaopeng Li,
Xiaobo Qu
Abstract:
Automated Vehicles (AVs) promise significant advances in transportation. Critical to these improvements is understanding AVs' longitudinal behavior, relying heavily on real-world trajectory data. Existing open-source trajectory datasets of AV, however, often fall short in refinement, reliability, and completeness, hindering effective performance metrics analysis and model development. This study a…
▽ More
Automated Vehicles (AVs) promise significant advances in transportation. Critical to these improvements is understanding AVs' longitudinal behavior, relying heavily on real-world trajectory data. Existing open-source trajectory datasets of AV, however, often fall short in refinement, reliability, and completeness, hindering effective performance metrics analysis and model development. This study addresses these challenges by creating a Unified Longitudinal TRAjectory dataset for AVs (Ultra-AV) to analyze their microscopic longitudinal driving behaviors. This dataset compiles data from 13 distinct sources, encompassing various AV types, test sites, and experiment scenarios. We established a three-step data processing: 1. extraction of longitudinal trajectory data, 2. general data cleaning, and 3. data-specific cleaning to obtain the longitudinal trajectory data and car-following trajectory data. The validity of the processed data is affirmed through performance evaluations across safety, mobility, stability, and sustainability, along with an analysis of the relationships between variables in car-following models. Our work not only furnishes researchers with standardized data and metrics for longitudinal AV behavior studies but also sets guidelines for data collection and model development.
△ Less
Submitted 16 May, 2024;
originally announced June 2024.
-
Non-uniqueness of weak solutions to 2D generalized Navier-Stokes equations
Authors:
Xinliang Li,
Zhong Tan
Abstract:
We study the non-uniqueness of weak solutions for the two-dimensional hyper-dissipative Navier-Stokes equations in the super-critical spaces $L_{t}^γL_{x}^{p}$ when $α\in[1,\frac{3}{2})$, and obtain the conclusion that the non-uniqueness of the weak solutions at the endpoint $(γ,p)=(\infty, \frac{2}{2α-1})$ is sharp in view of the generalized Ladyženskaja-Prodi-Serrin condition by using a differen…
▽ More
We study the non-uniqueness of weak solutions for the two-dimensional hyper-dissipative Navier-Stokes equations in the super-critical spaces $L_{t}^γL_{x}^{p}$ when $α\in[1,\frac{3}{2})$, and obtain the conclusion that the non-uniqueness of the weak solutions at the endpoint $(γ,p)=(\infty, \frac{2}{2α-1})$ is sharp in view of the generalized Ladyženskaja-Prodi-Serrin condition by using a different spatial-temporal building block from [Cheskidov-Luo, Ann. PDE, 9:13 (2023)] and taking advantage of the intermittency of the temporal concentrated function $g_{(k)}$ in an almost optimal way. Our results recover the above 2D non-uniqueness conclusion and extend to the hyper-dissipative case $α\in(1,\frac{3}{2})$.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
Effect of antibody levels on the spread of disease in multiple infections
Authors:
Xiangxi Li,
Yuhan Li,
Minyu Feng,
Jürgen Kurths
Abstract:
There are complex interactions between antibody levels and epidemic propagation, the antibody level of an individual influences the probability of infection, and the spread of the virus influences the antibody level of each individual. There exist some viruses that, in their natural state, cause antibody levels in an infected individual to gradually decay. When these antibody levels decay to a cer…
▽ More
There are complex interactions between antibody levels and epidemic propagation, the antibody level of an individual influences the probability of infection, and the spread of the virus influences the antibody level of each individual. There exist some viruses that, in their natural state, cause antibody levels in an infected individual to gradually decay. When these antibody levels decay to a certain point, the individual can be reinfected, such as with COVID 19. To describe their interaction, we introduce a novel mathematical model that incorporates the presence of an antibody retention rate to investigate the infection patterns of individuals who survive multiple infections. The model is composed of a system of stochastic differential equations to derive the equilibrium point and threshold of the model and presents rich experimental results of numerical simulations to further elucidate the propagation properties of the model. We find that the antibody decay rate strongly affects the propagation process, and also that different network structures have different sensitivities to the antibody decay rate, and that changes in the antibody decay rate cause stronger changes in the propagation process in Barabasi Albert networks. Furthermore, we investigate the stationary distribution of the number of infection states and the final antibody levels, and find that they both satisfy the normal distribution, but the standard deviation is small in the Barabasi Albert network. Finally, we explore the effect of individual antibody differences and decay rates on the final population antibody levels, and uncover that individual antibody differences do not affect the final mean antibody levels. The study offers valuable insights for epidemic prevention and control in practical applications.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
Unraveling and Mitigating Retriever Inconsistencies in Retrieval-Augmented Large Language Models
Authors:
Mingda Li,
Xinyu Li,
Yifan Chen,
Wenfeng Xuan,
Weinan Zhang
Abstract:
Although Retrieval-Augmented Large Language Models (RALMs) demonstrate their superiority in terms of factuality, they do not consistently outperform the original retrieval-free Language Models (LMs). Our experiments reveal that this example-level performance inconsistency exists not only between retrieval-augmented and retrieval-free LM but also among different retrievers. To understand this pheno…
▽ More
Although Retrieval-Augmented Large Language Models (RALMs) demonstrate their superiority in terms of factuality, they do not consistently outperform the original retrieval-free Language Models (LMs). Our experiments reveal that this example-level performance inconsistency exists not only between retrieval-augmented and retrieval-free LM but also among different retrievers. To understand this phenomenon, we investigate the degeneration behavior of RALMs and theoretically decompose it into four categories. Further analysis based on our decomposition reveals that the innate difference in knowledge sources and the unpredictable degeneration of the reader model contribute most to the inconsistency. Drawing from our analysis, we introduce Ensemble of Retrievers (EoR), a trainable framework that can adaptively retrieve from different knowledge sources and effectively decrease unpredictable reader errors. Our experiments on Open Domain Question Answering show that EoR substantially improves performance over the RALM with a single retriever by considerably reducing inconsistent behaviors.
△ Less
Submitted 4 June, 2024; v1 submitted 31 May, 2024;
originally announced May 2024.
-
Search for $e^{+}e^{-}\toη'ψ(2S)$ at center-of-mass energies from 4.66 to 4.95 GeV
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (638 additional authors not shown)
Abstract:
Using data samples with an integrated luminosity of $4.67~\mathrm{fb}^{-1}$ collected by the BESIII detector operating at the BEPCII collider, we search for the process $e^+e^- \rightarrow η' ψ(2S)$ at center-of-mass energies from $4.66$ to $4.95~\mathrm{GeV}$. No significant signal is observed, and upper limits for the Born cross sections $σ^B(e^+e^-\rightarrowη'ψ(2S))$ at the 90\% confidence lev…
▽ More
Using data samples with an integrated luminosity of $4.67~\mathrm{fb}^{-1}$ collected by the BESIII detector operating at the BEPCII collider, we search for the process $e^+e^- \rightarrow η' ψ(2S)$ at center-of-mass energies from $4.66$ to $4.95~\mathrm{GeV}$. No significant signal is observed, and upper limits for the Born cross sections $σ^B(e^+e^-\rightarrowη'ψ(2S))$ at the 90\% confidence level are determined.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.