Search | arXiv e-print repository

Cross section measurement of $e^+e^-\to ηψ(2S)$ and search for $e^+e^-\toη\tilde{X}(3872)$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (639 additional authors not shown)

Abstract: The energy-dependent cross section for $e^+e^-\to ηψ(2S)$ is measured at eighteen center of mass energies from 4.288 GeV to 4.951 GeV using the BESIII detector. Using the same data samples, we also perform the first search for the reaction $e^+e^-\toη\tilde{X}(3872)$, but no evidence is found for the $\tilde{X}(3872)$ in the $π^+π^- J/ψ$ mass distribution. At each of the eighteen center of mass en… ▽ More The energy-dependent cross section for $e^+e^-\to ηψ(2S)$ is measured at eighteen center of mass energies from 4.288 GeV to 4.951 GeV using the BESIII detector. Using the same data samples, we also perform the first search for the reaction $e^+e^-\toη\tilde{X}(3872)$, but no evidence is found for the $\tilde{X}(3872)$ in the $π^+π^- J/ψ$ mass distribution. At each of the eighteen center of mass energies, upper limits at the 90\% confidence level on the cross section for $e^+e^-\toηψ(2S)$ and on the product of the $e^+e^-\toη\tilde{X}(3872)$ cross section with the branching fraction of $\tilde{X}(3872)\toπ^+π^- J/ψ$ are reported. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.16378 [pdf, other]

Play to Your Strengths: Collaborative Intelligence of Conventional Recommender Models and Large Language Models

Authors: Yunjia Xi, Weiwen Liu, Jianghao Lin, Chuhan Wu, Bo Chen, Ruiming Tang, Weinan Zhang, Yong Yu

Abstract: The rise of large language models (LLMs) has opened new opportunities in Recommender Systems (RSs) by enhancing user behavior modeling and content understanding. However, current approaches that integrate LLMs into RSs solely utilize either LLM or conventional recommender model (CRM) to generate final recommendations, without considering which data segments LLM or CRM excel in. To fill in this gap… ▽ More The rise of large language models (LLMs) has opened new opportunities in Recommender Systems (RSs) by enhancing user behavior modeling and content understanding. However, current approaches that integrate LLMs into RSs solely utilize either LLM or conventional recommender model (CRM) to generate final recommendations, without considering which data segments LLM or CRM excel in. To fill in this gap, we conduct experiments on MovieLens-1M and Amazon-Books datasets, and compare the performance of a representative CRM (DCNv2) and an LLM (LLaMA2-7B) on various groups of data samples. Our findings reveal that LLMs excel in data segments where CRMs exhibit lower confidence and precision, while samples where CRM excels are relatively challenging for LLM, requiring substantial training data and a long training time for comparable performance. This suggests potential synergies in the combination between LLM and CRM. Motivated by these insights, we propose Collaborative Recommendation with conventional Recommender and Large Language Model (dubbed \textit{CoReLLa}). In this framework, we first jointly train LLM and CRM and address the issue of decision boundary shifts through alignment loss. Then, the resource-efficient CRM, with a shorter inference time, handles simple and moderate samples, while LLM processes the small subset of challenging samples for CRM. Our experimental results demonstrate that CoReLLa outperforms state-of-the-art CRM and LLM methods significantly, underscoring its effectiveness in recommendation tasks. △ Less

Submitted 24 March, 2024; originally announced March 2024.

arXiv:2403.16131 [pdf, other]

Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement

Authors: Xiuquan Hou, Meiqin Liu, Senlin Zhang, ** Wei, Badong Chen

Abstract: DETR-like methods have significantly increased detection performance in an end-to-end manner. The mainstream two-stage frameworks of them perform dense self-attention and select a fraction of queries for sparse cross-attention, which is proven effective for improving performance but also introduces a heavy computational burden and high dependence on stable query selection. This paper demonstrates… ▽ More DETR-like methods have significantly increased detection performance in an end-to-end manner. The mainstream two-stage frameworks of them perform dense self-attention and select a fraction of queries for sparse cross-attention, which is proven effective for improving performance but also introduces a heavy computational burden and high dependence on stable query selection. This paper demonstrates that suboptimal two-stage selection strategies result in scale bias and redundancy due to the mismatch between selected queries and objects in two-stage initialization. To address these issues, we propose hierarchical salience filtering refinement, which performs transformer encoding only on filtered discriminative queries, for a better trade-off between computational efficiency and precision. The filtering process overcomes scale bias through a novel scale-independent salience supervision. To compensate for the semantic misalignment among queries, we introduce elaborate query refinement modules for stable two-stage initialization. Based on above improvements, the proposed Salience DETR achieves significant improvements of +4.0% AP, +0.2% AP, +4.4% AP on three challenging task-specific detection datasets, as well as 49.2% AP on COCO 2017 with less FLOPs. The code is available at https://github.com/xiuqhou/Salience-DETR. △ Less

Submitted 24 March, 2024; originally announced March 2024.

Comments: Accepted to CVPR 2024

arXiv:2403.16015 [pdf, other]

MQE: Unleashing the Power of Interaction with Multi-agent Quadruped Environment

Authors: Ziyan Xiong, Bo Chen, Shiyu Huang, Wei-Wei Tu, Zhaofeng He, Yang Gao

Abstract: The advent of deep reinforcement learning (DRL) has significantly advanced the field of robotics, particularly in the control and coordination of quadruped robots. However, the complexity of real-world tasks often necessitates the deployment of multi-robot systems capable of sophisticated interaction and collaboration. To address this need, we introduce the Multi-agent Quadruped Environment (MQE),… ▽ More The advent of deep reinforcement learning (DRL) has significantly advanced the field of robotics, particularly in the control and coordination of quadruped robots. However, the complexity of real-world tasks often necessitates the deployment of multi-robot systems capable of sophisticated interaction and collaboration. To address this need, we introduce the Multi-agent Quadruped Environment (MQE), a novel platform designed to facilitate the development and evaluation of multi-agent reinforcement learning (MARL) algorithms in realistic and dynamic scenarios. MQE emphasizes complex interactions between robots and objects, hierarchical policy structures, and challenging evaluation scenarios that reflect real-world applications. We present a series of collaborative and competitive tasks within MQE, ranging from simple coordination to complex adversarial interactions, and benchmark state-of-the-art MARL algorithms. Our findings indicate that hierarchical reinforcement learning can simplify task learning, but also highlight the need for advanced algorithms capable of handling the intricate dynamics of multi-agent interactions. MQE serves as a step** stone towards bridging the gap between simulation and practical deployment, offering a rich environment for future research in multi-agent systems and robot learning. For open-sourced code and more details of MQE, please refer to https://ziyanx02.github.io/multiagent-quadruped-environment/ . △ Less

Submitted 24 March, 2024; originally announced March 2024.

Comments: Open-source code is available at https://github.com/ziyanx02/multiagent-quadruped-environment

arXiv:2403.15069 [pdf, other]

Allspark: Workload Orchestration for Visual Transformers on Processing In-Memory Systems

Authors: Mengke Ge, Junpeng Wang, Binhan Chen, Yingjian Zhong, Haitao Du, Song Chen, Yi Kang

Abstract: The advent of Transformers has revolutionized computer vision, offering a powerful alternative to convolutional neural networks (CNNs), especially with the local attention mechanism that excels at capturing local structures within the input and achieve state-of-the-art performance. Processing in-memory (PIM) architecture offers extensive parallelism, low data movement costs, and scalable memory ba… ▽ More The advent of Transformers has revolutionized computer vision, offering a powerful alternative to convolutional neural networks (CNNs), especially with the local attention mechanism that excels at capturing local structures within the input and achieve state-of-the-art performance. Processing in-memory (PIM) architecture offers extensive parallelism, low data movement costs, and scalable memory bandwidth, making it a promising solution to accelerate Transformer with memory-intensive operations. However, the crucial challenge lies in efficiently deploying the entire model onto a resource-limited PIM system while parallelizing each transformer block with potentially many computational branches based on local attention mechanisms. We present Allspark, which focuses on workload orchestration for visual Transformers on PIM systems, aiming at minimizing inference latency. Firstly, to fully utilize the massive parallelism of PIM, Allspark empolys a finer-grained partitioning scheme for computational branches, and format a systematic layout and interleaved dataflow with maximized data locality and reduced data movement. Secondly, Allspark formulates the scheduling of the complete model on a resource-limited distributed PIM system as an integer linear programming (ILP) problem. Thirdly, as local-global data interactions exhibit complex yet regular dependencies, Allspark provides a greedy-based map** method to allocate computational branches onto the PIM system and minimize NoC communication costs. Extensive experiments on 3D-stacked DRAM-based PIM systems show that Allspark brings 1.2x-24.0x inference speedup for various visual Transformers over baselines, and that Allspark-enriched PIM system yields average speedups of 2.3x and energy savings of 20x-55x over Nvidia V100 GPU. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: The article is currently under review by IEEE Transactions on Computers, and has been submitted to HPCA'2024 and ISCA'2024

arXiv:2403.14998 [pdf, other]

Precise measurement of the $e^+e^-\to D_s^+D_s^-$ cross sections at center-of-mass energies from threshold to 4.95 GeV

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (638 additional authors not shown)

Abstract: Using the $e^+e^-$ collision data collected with the BESIII detector operating at the BEPCII collider, at center-of-mass energies from the threshold to $4.95$~GeV, we present precise measurements of the cross sections for the process $e^+e^-\to D_s^+D_s^-$ using a single tag method. The resulting cross section lineshape exhibits several new structures, thereby offering an input for coupled channel… ▽ More Using the $e^+e^-$ collision data collected with the BESIII detector operating at the BEPCII collider, at center-of-mass energies from the threshold to $4.95$~GeV, we present precise measurements of the cross sections for the process $e^+e^-\to D_s^+D_s^-$ using a single tag method. The resulting cross section lineshape exhibits several new structures, thereby offering an input for coupled channel analysis and model tests, which are critical to understand vector charmonium-like states with masses between 4 and 5~GeV. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: 9 pages, 4 figures, published to PRL

arXiv:2403.14268 [pdf]

Speech-Aware Neural Diarization with Encoder-Decoder Attractor Guided by Attention Constraints

Authors: PeiYing Lee, HauYun Guo, Berlin Chen

Abstract: End-to-End Neural Diarization with Encoder-Decoder based Attractor (EEND-EDA) is an end-to-end neural model for automatic speaker segmentation and labeling. It achieves the capability to handle flexible number of speakers by estimating the number of attractors. EEND-EDA, however, struggles to accurately capture local speaker dynamics. This work proposes an auxiliary loss that aims to guide the Tra… ▽ More End-to-End Neural Diarization with Encoder-Decoder based Attractor (EEND-EDA) is an end-to-end neural model for automatic speaker segmentation and labeling. It achieves the capability to handle flexible number of speakers by estimating the number of attractors. EEND-EDA, however, struggles to accurately capture local speaker dynamics. This work proposes an auxiliary loss that aims to guide the Transformer encoders at the lower layer of EEND-EDA model to enhance the effect of self-attention modules using speaker activity information. The results evaluated on public dataset Mini LibriSpeech, demonstrates the effectiveness of the work, reducing Diarization Error Rate from 30.95% to 28.17%. We will release the source code on GitHub to allow further research and reproducibility. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Comments: Accepted to The 28th International Conference on Technologies and Applications of Artificial Intelligence (TAAI), in Chinese language

Report number: TAAI2023-Domestic-131

arXiv:2403.14057 [pdf]

Exploring Fermi Surface Nesting and the Nature of Heavy Quasiparticles in the Spin-Triplet Superconductor Candidate CeRh$_2$As$_2$

Authors: Bo Chen, Hao Liu, Qi-Yi Wu, Chen Zhang, Xue-Qing Ye, Yin-Zou Zhao, Jiao-Jiao Song, Xin-Yi Tian, Ba-Lei Tan, Zheng-Tai Liu, Mao Ye, Zhen-Hua Chen, Yao-Bo Huang, Da-Wei Shen, Ya-Hua Yuan, Jun He, Yu-Xia Duan, Jian-Qiao Meng

Abstract: In this study, we investigate the electronic structure of a spin-triplet superconductor candidate CeRh$_2$As$_2$ using high-resolution angle-resolved photoemission spectroscopy and density functional theory calculations. Notably, Fermi surface nesting hints at connections to magnetic excitation or quadrupole density wave phenomena, elucidating the superconducting mechanisms. Measured band structur… ▽ More In this study, we investigate the electronic structure of a spin-triplet superconductor candidate CeRh$_2$As$_2$ using high-resolution angle-resolved photoemission spectroscopy and density functional theory calculations. Notably, Fermi surface nesting hints at connections to magnetic excitation or quadrupole density wave phenomena, elucidating the superconducting mechanisms. Measured band structures reveal primarily localized 4f electrons, with minor itinerant contributions. Additionally, a transition from localized to itinerant behavior and significant c-f hybridization anisotropy underscore the role of f-electrons in sha** electronic properties. These findings deepen our understanding of CeRh$_2$As$_2$'s unconventional superconductivity and magnetism. Further exploration promises advances in superconductivity research. △ Less

Submitted 20 March, 2024; originally announced March 2024.

Comments: 6 pages, 3 figures

arXiv:2403.13437 [pdf, other]

Search for $ΔS=2$ nonleptonic hyperon decays $Ω^-\toΣ^{0}π^{-}$ and $Ω^-\to nK^{-}$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (638 additional authors not shown)

Abstract: Using $(27.12 \pm 0.14) \times 10^{8}$ $ψ(3686)$ events collected by the BESIII detector at the center-of-mass energy of $\sqrt{s} = 3.686$ GeV, we search for the first time for two nonleptonic hyperon decays that change strangeness by two units, $Ω^-\toΣ^{0}π^-$ and $Ω^-\to nK^{-}$. No significant signal is observed. The upper limits on their decay branching fractions are determined to be… ▽ More Using $(27.12 \pm 0.14) \times 10^{8}$ $ψ(3686)$ events collected by the BESIII detector at the center-of-mass energy of $\sqrt{s} = 3.686$ GeV, we search for the first time for two nonleptonic hyperon decays that change strangeness by two units, $Ω^-\toΣ^{0}π^-$ and $Ω^-\to nK^{-}$. No significant signal is observed. The upper limits on their decay branching fractions are determined to be $\mathcal{B}(Ω^-\toΣ^{0}π^-) < 5.4\times 10^{-4}$ and $\mathcal{B}(Ω^-\to nK^{-}) < 2.4\times 10^{-4}$ at the $90\%$ confidence level. △ Less

Submitted 14 April, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

arXiv:2403.13352 [pdf, other]

AGFSync: Leveraging AI-Generated Feedback for Preference Optimization in Text-to-Image Generation

Authors: **gkun An, Yinghao Zhu, Zongjian Li, Haoran Feng, Bohua Chen, Yemin Shi, Chengwei Pan

Abstract: Text-to-Image (T2I) diffusion models have achieved remarkable success in image generation. Despite their progress, challenges remain in both prompt-following ability, image quality and lack of high-quality datasets, which are essential for refining these models. As acquiring labeled data is costly, we introduce AGFSync, a framework that enhances T2I diffusion models through Direct Preference Optim… ▽ More Text-to-Image (T2I) diffusion models have achieved remarkable success in image generation. Despite their progress, challenges remain in both prompt-following ability, image quality and lack of high-quality datasets, which are essential for refining these models. As acquiring labeled data is costly, we introduce AGFSync, a framework that enhances T2I diffusion models through Direct Preference Optimization (DPO) in a fully AI-driven approach. AGFSync utilizes Vision-Language Models (VLM) to assess image quality across style, coherence, and aesthetics, generating feedback data within an AI-driven loop. By applying AGFSync to leading T2I models such as SD v1.4, v1.5, and SDXL, our extensive experiments on the TIFA dataset demonstrate notable improvements in VQA scores, aesthetic evaluations, and performance on the HPSv2 benchmark, consistently outperforming the base models. AGFSync's method of refining T2I diffusion models paves the way for scalable alignment techniques. △ Less

Submitted 3 April, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

arXiv:2403.13338 [pdf, other]

Adaptive Critical Subgraph Mining for Cognitive Impairment Conversion Prediction with T1-MRI-based Brain Network

Authors: Yilin Leng, Wenju Cui, Bai Chen, Xi Jiang, Shuangqing Chen, Jian Zheng

Abstract: Prediction the conversion to early-stage dementia is critical for mitigating its progression but remains challenging due to subtle cognitive impairments and structural brain changes. Traditional T1-weighted magnetic resonance imaging (T1-MRI) research focus on identifying brain atrophy regions but often fails to address the intricate connectivity between them. This limitation underscores the neces… ▽ More Prediction the conversion to early-stage dementia is critical for mitigating its progression but remains challenging due to subtle cognitive impairments and structural brain changes. Traditional T1-weighted magnetic resonance imaging (T1-MRI) research focus on identifying brain atrophy regions but often fails to address the intricate connectivity between them. This limitation underscores the necessity of focuing on inter-regional connectivity for a comprehensive understand of the brain's complex network. Moreover, there is a pressing demand for methods that adaptively preserve and extract critical information, particularly specialized subgraph mining techniques for brain networks. These are essential for develo** high-quality feature representations that reveal critical spatial impacts of structural brain changes and its topology. In this paper, we propose Brain-SubGNN, a novel graph representation network to mine and enhance critical subgraphs based on T1-MRI. This network provides a subgraph-level interpretation, enhancing interpretability and insights for graph analysis. The process begins by extracting node features and a correlation matrix between nodes to construct a task-oriented brain network. Brain-SubGNN then adaptively identifies and enhances critical subgraphs, capturing both loop and neighbor subgraphs. This method reflects the loop topology and local changes, indicative of long-range connections, and maintains local and global brain attributes. Extensive experiments validate the effectiveness and advantages of Brain-SubGNN, demonstrating its potential as a powerful tool for understanding and diagnosing early-stage dementia. Source code is available at https://github.com/Leng-10/Brain-SubGNN. △ Less

Submitted 26 June, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

Comments: 20 pages

arXiv:2403.13208 [pdf, other]

CaDRE: Controllable and Diverse Generation of Safety-Critical Driving Scenarios using Real-World Trajectories

Authors: Peide Huang, Wenhao Ding, Jonathan Francis, Bingqing Chen, Ding Zhao

Abstract: Simulation is an indispensable tool in the development and testing of autonomous vehicles (AVs), offering an efficient and safe alternative to road testing by allowing the exploration of a wide range of scenarios. Despite its advantages, a significant challenge within simulation-based testing is the generation of safety-critical scenarios, which are essential to ensure that AVs can handle rare but… ▽ More Simulation is an indispensable tool in the development and testing of autonomous vehicles (AVs), offering an efficient and safe alternative to road testing by allowing the exploration of a wide range of scenarios. Despite its advantages, a significant challenge within simulation-based testing is the generation of safety-critical scenarios, which are essential to ensure that AVs can handle rare but potentially fatal situations. This paper addresses this challenge by introducing a novel generative framework, CaDRE, which is specifically designed for generating diverse and controllable safety-critical scenarios using real-world trajectories. Our approach optimizes for both the quality and diversity of scenarios by employing a unique formulation and algorithm that integrates real-world data, domain knowledge, and black-box optimization techniques. We validate the effectiveness of our framework through extensive testing in three representative types of traffic scenarios. The results demonstrate superior performance in generating diverse and high-quality scenarios with greater sample efficiency than existing reinforcement learning and sampling-based methods. △ Less

Submitted 19 March, 2024; originally announced March 2024.

arXiv:2403.12660 [pdf, other]

ERASE: Benchmarking Feature Selection Methods for Deep Recommender Systems

Authors: Pengyue Jia, Ye**g Wang, Zhaocheng Du, Xiangyu Zhao, Yichao Wang, Bo Chen, Wanyu Wang, Huifeng Guo, Ruiming Tang

Abstract: Deep Recommender Systems (DRS) are increasingly dependent on a large number of feature fields for more precise recommendations. Effective feature selection methods are consequently becoming critical for further enhancing the accuracy and optimizing storage efficiencies to align with the deployment demands. This research area, particularly in the context of DRS, is nascent and faces three core chal… ▽ More Deep Recommender Systems (DRS) are increasingly dependent on a large number of feature fields for more precise recommendations. Effective feature selection methods are consequently becoming critical for further enhancing the accuracy and optimizing storage efficiencies to align with the deployment demands. This research area, particularly in the context of DRS, is nascent and faces three core challenges. Firstly, variant experimental setups across research papers often yield unfair comparisons, obscuring practical insights. Secondly, the existing literature's lack of detailed analysis on selection attributes, based on large-scale datasets and a thorough comparison among selection techniques and DRS backbones, restricts the generalizability of findings and impedes deployment on DRS. Lastly, research often focuses on comparing the peak performance achievable by feature selection methods, an approach that is typically computationally infeasible for identifying the optimal hyperparameters and overlooks evaluating the robustness and stability of these methods. To bridge these gaps, this paper presents ERASE, a comprehensive bEnchmaRk for feAture SElection for DRS. ERASE comprises a thorough evaluation of eleven feature selection methods, covering both traditional and deep learning approaches, across four public datasets, private industrial datasets, and a real-world commercial platform, achieving significant enhancement. Our code is available online for ease of reproduction. △ Less

Submitted 19 June, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

Comments: Accepted to KDD 2024

arXiv:2403.12211 [pdf, other]

A Unified Model for Longitudinal Multi-Modal Multi-View Prediction with Missingness

Authors: Boqi Chen, Junier Oliva, Marc Niethammer

Abstract: Medical records often consist of different modalities, such as images, text, and tabular information. Integrating all modalities offers a holistic view of a patient's condition, while analyzing them longitudinally provides a better understanding of disease progression. However, real-world longitudinal medical records present challenges: 1) patients may lack some or all of the data for a specific t… ▽ More Medical records often consist of different modalities, such as images, text, and tabular information. Integrating all modalities offers a holistic view of a patient's condition, while analyzing them longitudinally provides a better understanding of disease progression. However, real-world longitudinal medical records present challenges: 1) patients may lack some or all of the data for a specific timepoint, and 2) certain modalities or views might be absent for all patients during a particular period. In this work, we introduce a unified model for longitudinal multi-modal multi-view prediction with missingness. Our method allows as many timepoints as desired for input, and aims to leverage all available data, regardless of their availability. We conduct extensive experiments on the knee osteoarthritis dataset from the Osteoarthritis Initiative for pain and Kellgren-Lawrence grade prediction at a future timepoint. We demonstrate the effectiveness of our method by comparing results from our unified model to specific models that use the same modality and view combinations during training and evaluation. We also show the benefit of having extended temporal data and provide post-hoc analysis for a deeper understanding of each modality/view's importance for different tasks. △ Less

Submitted 21 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.11427 [pdf, other]

BAGS: Building Animatable Gaussian Splatting from a Monocular Video with Diffusion Priors

Authors: Tingyang Zhang, Qingzhe Gao, Weiyu Li, Libin Liu, Baoquan Chen

Abstract: Animatable 3D reconstruction has significant applications across various fields, primarily relying on artists' handcraft creation. Recently, some studies have successfully constructed animatable 3D models from monocular videos. However, these approaches require sufficient view coverage of the object within the input video and typically necessitate significant time and computational costs for train… ▽ More Animatable 3D reconstruction has significant applications across various fields, primarily relying on artists' handcraft creation. Recently, some studies have successfully constructed animatable 3D models from monocular videos. However, these approaches require sufficient view coverage of the object within the input video and typically necessitate significant time and computational costs for training and rendering. This limitation restricts the practical applications. In this work, we propose a method to build animatable 3D Gaussian Splatting from monocular video with diffusion priors. The 3D Gaussian representations significantly accelerate the training and rendering process, and the diffusion priors allow the method to learn 3D models with limited viewpoints. We also present the rigid regularization to enhance the utilization of the priors. We perform an extensive evaluation across various real-world videos, demonstrating its superior performance compared to the current state-of-the-art methods. △ Less

Submitted 17 March, 2024; originally announced March 2024.

Comments: https://talegqz.github.io/BAGS/

arXiv:2403.11057 [pdf, other]

Large Language Models Powered Context-aware Motion Prediction

Authors: Xiaoji Zheng, Lixiu Wu, Zhijie Yan, Yuanrong Tang, Hao Zhao, Chen Zhong, Bokui Chen, Jiangtao Gong

Abstract: Motion prediction is among the most fundamental tasks in autonomous driving. Traditional methods of motion forecasting primarily encode vector information of maps and historical trajectory data of traffic participants, lacking a comprehensive understanding of overall traffic semantics, which in turn affects the performance of prediction tasks. In this paper, we utilized Large Language Models (LLMs… ▽ More Motion prediction is among the most fundamental tasks in autonomous driving. Traditional methods of motion forecasting primarily encode vector information of maps and historical trajectory data of traffic participants, lacking a comprehensive understanding of overall traffic semantics, which in turn affects the performance of prediction tasks. In this paper, we utilized Large Language Models (LLMs) to enhance the global traffic context understanding for motion prediction tasks. We first conducted systematic prompt engineering, visualizing complex traffic environments and historical trajectory information of traffic participants into image prompts -- Transportation Context Map (TC-Map), accompanied by corresponding text prompts. Through this approach, we obtained rich traffic context information from the LLM. By integrating this information into the motion prediction model, we demonstrate that such context can enhance the accuracy of motion predictions. Furthermore, considering the cost associated with LLMs, we propose a cost-effective deployment strategy: enhancing the accuracy of motion prediction tasks at scale with 0.7\% LLM-augmented datasets. Our research offers valuable insights into enhancing the understanding of traffic scenes of LLMs and the motion prediction performance of autonomous driving. △ Less

Submitted 16 March, 2024; originally announced March 2024.

Comments: 6 pages,4 figures

MSC Class: 68T45

arXiv:2403.10962 [pdf, other]

Exploiting Topological Priors for Boosting Point Cloud Generation

Authors: Baiyuan Chen

Abstract: This paper presents an innovative enhancement to the Sphere as Prior Generative Adversarial Network (SP-GAN) model, a state-of-the-art GAN designed for point cloud generation. A novel method is introduced for point cloud generation that elevates the structural integrity and overall quality of the generated point clouds by incorporating topological priors into the training process of the generator.… ▽ More This paper presents an innovative enhancement to the Sphere as Prior Generative Adversarial Network (SP-GAN) model, a state-of-the-art GAN designed for point cloud generation. A novel method is introduced for point cloud generation that elevates the structural integrity and overall quality of the generated point clouds by incorporating topological priors into the training process of the generator. Specifically, this work utilizes the K-means algorithm to segment a point cloud from the repository into clusters and extract centroids, which are then used as priors in the generation process of the SP-GAN. Furthermore, the discriminator component of the SP-GAN utilizes the identical point cloud that contributed the centroids, ensuring a coherent and consistent learning environment. This strategic use of centroids as intuitive guides not only boosts the efficiency of global feature learning but also substantially improves the structural coherence and fidelity of the generated point clouds. By applying the K-means algorithm to generate centroids as the prior, the work intuitively and experimentally demonstrates that such a prior enhances the quality of generated point clouds. △ Less

Submitted 26 April, 2024; v1 submitted 16 March, 2024; originally announced March 2024.

Comments: 7 pages, 3 figures

arXiv:2403.10877 [pdf, ps, other]

Test of lepton universality and measurement of the form factors of $D^0\to K^{*}(892)^-μ^+ν_μ$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (637 additional authors not shown)

Abstract: We report a first study of the semileptonic decay $D^0\rightarrow K^-π^0μ^{+}ν_μ$ by analyzing an $e^+e^-$ annihilation data sample of $7.9~\mathrm{fb}^{-1}$ collected at the center-of-mass energy of 3.773 GeV with the BESIII detector. The absolute branching fraction of $D^0\to K^-π^0μ^{+}ν_μ$ is measured for the first time to be $(0.729 \pm 0.014_{\rm stat} \pm 0.011_{\rm syst})\%$. Based on an a… ▽ More We report a first study of the semileptonic decay $D^0\rightarrow K^-π^0μ^{+}ν_μ$ by analyzing an $e^+e^-$ annihilation data sample of $7.9~\mathrm{fb}^{-1}$ collected at the center-of-mass energy of 3.773 GeV with the BESIII detector. The absolute branching fraction of $D^0\to K^-π^0μ^{+}ν_μ$ is measured for the first time to be $(0.729 \pm 0.014_{\rm stat} \pm 0.011_{\rm syst})\%$. Based on an amplitude analysis, the $S\text{-}{\rm wave}$ contribution is determined to be $(5.76 \pm 0.35_{\rm stat} \pm 0.29_{\rm syst})\%$ of the total decay rate in addition to the dominated $K^{*}(892)^-$ component. The branching fraction of $D^0\to K^{*}(892)^-μ^+ν_μ$ is given to be $(2.062 \pm 0.039_{\rm stat} \pm 0.032_{\rm syst})\%$, which improves the precision of the world average by a factor of 5. Combining with the world average of ${\mathcal B}(D^0\to K^{*}(892)^-e^+ν_e)$, the ratio of the branching fractions obtained is $\frac{{\mathcal B}(D^0\to K^{*}(892)^-μ^+ν_μ)}{{\mathcal B}(D^0\to K^{*}(892)^-e^+ν_e)} = 0.96\pm0.08$, in agreement with lepton flavor universality. Furthermore, assuming single-pole dominance parameterization, the most precise hadronic form factor ratios for $D^0\to K^{*}(892)^{-} μ^+ν_μ$ are extracted to be $r_{V}=V(0)/A_1(0)=1.37 \pm 0.09_{\rm stat} \pm 0.03_{\rm syst}$ and $r_{2}=A_2(0)/A_1(0)=0.76 \pm 0.06_{\rm stat} \pm 0.02_{\rm syst}$. △ Less

Submitted 16 March, 2024; originally announced March 2024.

Comments: 9 pages, 3 figures

arXiv:2403.10749 [pdf, other]

Evolution of chirality from transverse wobbling in $^{135}$Pr

Authors: N. Sensharma, U. Garg, Q. B. Chen, S. Frauendorf, S. Zhu, J. Arroyo, A. D. Ayangeakaa, D. P. Burdette, M. P. Carpenter, P. Copp, J. L. Cozzi, S. S. Ghugre, D. J. Hartley, K. B. Howard, R. V. F. Janssens, F. G. Kondev, T. Lauritsen, J. Li, R. Palit, A. Saracino, D. Seweryniak, S. Weyhmiller, J. Wu

Abstract: Chirality is a distinct signature that characterizes triaxial shapes in nuclei. We report the first observation of chirality in the nucleus $^{135}$Pr using a high-statistics Gammasphere experiment with the $^{123}$Sb($^{16}$O,4n)$^{135}$Pr reaction. Two chiral-partner bands with the configuration $π(1h_{11/2})^1\otimes ν(1h_{11/2})^{-2}$ have been identified in this nucleus. Angular distribution… ▽ More Chirality is a distinct signature that characterizes triaxial shapes in nuclei. We report the first observation of chirality in the nucleus $^{135}$Pr using a high-statistics Gammasphere experiment with the $^{123}$Sb($^{16}$O,4n)$^{135}$Pr reaction. Two chiral-partner bands with the configuration $π(1h_{11/2})^1\otimes ν(1h_{11/2})^{-2}$ have been identified in this nucleus. Angular distribution analyses of the $ΔI = 1$ connecting transitions between the two chiral partners have revealed a dominant dipole character. Quasiparticle triaxial rotor model calculations are in good agreement with the experiment. This is the first time that both signatures of triaxiality--chirality and wobbling--have been observed in the same nucleus. △ Less

Submitted 15 March, 2024; originally announced March 2024.

Comments: 11 pages, 11 figures

arXiv:2403.09895 [pdf, other]

Overcoming the cohesive zone limit in the modelling of composites delamination with TUBA cohesive elements

Authors: Giorgio Tosti Balducci, Boyang Chen

Abstract: The wide adoption of composite structures in the aerospace industry requires reliable numerical methods to account for the effects of various damage mechanisms, including delamination. Cohesive elements are a versatile and physically representative way of modelling delamination. However, using their standard form which conforms to solid substrate elements, multiple elements are required in the nar… ▽ More The wide adoption of composite structures in the aerospace industry requires reliable numerical methods to account for the effects of various damage mechanisms, including delamination. Cohesive elements are a versatile and physically representative way of modelling delamination. However, using their standard form which conforms to solid substrate elements, multiple elements are required in the narrow cohesive zone, thereby requiring an excessively fine mesh and hindering the applicability in practical scenarios. The present work focuses on the implementation and testing of triangular thin plate substrate elements and compatible cohesive elements, which satisfy C1-continuity in the domain. The improved regularity meets the continuity requirement coming from the Kirchhoff Plate Theory and the triangular shape allows for conformity to complex geometries. The overall model is validated for mode I delamination, the case with the smallest cohesive zone. Very accurate predictions of the limit load and crack propagation phase are achieved, using elements as large as 11 times the cohesive zone. △ Less

Submitted 14 March, 2024; originally announced March 2024.

arXiv:2403.08654 [pdf, other]

An Efficient End-to-End Approach to Noise Invariant Speech Features via Multi-Task Learning

Authors: Heitor R. Guimarães, Arthur Pimentel, Anderson R. Avila, Mehdi Rezagholizadeh, Boxing Chen, Tiago H. Falk

Abstract: Self-supervised speech representation learning enables the extraction of meaningful features from raw waveforms. These features can then be efficiently used across multiple downstream tasks. However, two significant issues arise when considering the deployment of such methods ``in-the-wild": (i) Their large size, which can be prohibitive for edge applications; and (ii) their robustness to detrimen… ▽ More Self-supervised speech representation learning enables the extraction of meaningful features from raw waveforms. These features can then be efficiently used across multiple downstream tasks. However, two significant issues arise when considering the deployment of such methods ``in-the-wild": (i) Their large size, which can be prohibitive for edge applications; and (ii) their robustness to detrimental factors, such as noise and/or reverberation, that can heavily degrade the performance of such systems. In this work, we propose RobustDistiller, a novel knowledge distillation mechanism that tackles both problems jointly. Simultaneously to the distillation recipe, we apply a multi-task learning objective to encourage the network to learn noise-invariant representations by denoising the input. The proposed mechanism is evaluated on twelve different downstream tasks. It outperforms several benchmarks regardless of noise type, or noise and reverberation levels. Experimental results show that the new Student model with 23M parameters can achieve results comparable to the Teacher model with 95M parameters. Lastly, we show that the proposed recipe can be applied to other distillation methodologies, such as the recent DPWavLM. For reproducibility, code and model checkpoints will be made available at \mbox{\url{https://github.com/Hguimaraes/robustdistiller}}. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: Under review on IEEE Transactions on Audio, Speech, and Language Processing (2024)

arXiv:2403.08600 [pdf]

Evaluation of Control/User-Plane Denial-of-Service (DoS) Attack on O-RAN Fronthaul Interface

Authors: Ferlinda Feliana, Ting-Wei Hung, Binbin Chen, Ray-Guang Cheng

Abstract: The open fronthaul interface defined by O-RAN ALLIANCE aims to support the interoperability between multi-vendor open radio access network (O-RAN) radio units (O-RU) and O-RAN distributed units (O-DU). This paper introduces a new tool that could be used to evaluate Denial-of-Service (DoS) attacks against the open fronthaul interface. We launched an array of control/user planes (C/U-Planes) attacks… ▽ More The open fronthaul interface defined by O-RAN ALLIANCE aims to support the interoperability between multi-vendor open radio access network (O-RAN) radio units (O-RU) and O-RAN distributed units (O-DU). This paper introduces a new tool that could be used to evaluate Denial-of-Service (DoS) attacks against the open fronthaul interface. We launched an array of control/user planes (C/U-Planes) attacks with the tool under different traffic types and data rates, and we evaluated their impacts on the throughput and block error rate (BLER) of real-world O-RAN systems with commercial hardware. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: Accepted by IEEE INFOCOM Workshop: Next-generation Open and Programmable Radio Access Networks (NG-OPERA)

arXiv:2403.07837 [pdf, other]

Topological Protection of Optical Skyrmions through Complex Media

Authors: An Aloysius Wang, Zimo Zhao, Yifei Ma, Yuxi Cai, Tade Marozsak, Binguo Chen, Honghui He, Lin Luo, Martin J Booth, Steve J Elston, Stephen M Morris, Chao He

Abstract: Recent experimental realizations of optical Skyrmions through the techniques of structured light have opened the doors to a completely new way of representing data in electromagnetic fields, namely its topology. Apart from potentially enhancing the bandwidth of optical systems, the intrinsically discrete nature of the topological number allows Skyrmions to naturally interface with the digital worl… ▽ More Recent experimental realizations of optical Skyrmions through the techniques of structured light have opened the doors to a completely new way of representing data in electromagnetic fields, namely its topology. Apart from potentially enhancing the bandwidth of optical systems, the intrinsically discrete nature of the topological number allows Skyrmions to naturally interface with the digital world. However, investigations into the topological protection of optical Skyrmions through various media remain limited to date. Here, we rigorously define the optical Skyrmion and establish a framework that can be used to analyze the effects of complex media on the topology of Skyrmion fields. Using this framework, we provide simple criteria for spatially varying retarders, diattenuators, depolarizers, and combinations of the former under which topological protection is guaranteed. We then present experimental results validating the robustness of the Skyrmion number against corrupting media and discuss ways of extending the optical Skyrmion to more general settings. We believe that the work presented in this paper provides a theoretical underpinning for the use of Skyrmions in practical applications ranging from optical communications to photonic computing. △ Less

Submitted 12 March, 2024; originally announced March 2024.

arXiv:2403.06886 [pdf, other]

QED Effects on Kerr-Newman Black Hole Shadows

Authors: Shaobing Yuan, Changkai Luo, Zezhou Hu, Zhenyu Zhang, Bin Chen

Abstract: Incorporating first-order QED effects, we explore the shadows of Kerr-Newman black holes with a magnetic charge through the numerical backward ray-tracing method. Our investigation accounts for both the direct influence of the electromagnetic field on light rays and the distortion of the background spacetime metric due to QED corrections. We notice that the area of the shadow increases with the QE… ▽ More Incorporating first-order QED effects, we explore the shadows of Kerr-Newman black holes with a magnetic charge through the numerical backward ray-tracing method. Our investigation accounts for both the direct influence of the electromagnetic field on light rays and the distortion of the background spacetime metric due to QED corrections. We notice that the area of the shadow increases with the QED effect, mainly due to the fact that the photons move more slowly in the effective medium and become easier to be trapped by the black hole. △ Less

Submitted 26 March, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

Comments: 14 pages, 8 figures; v2: references added and minor revisions

arXiv:2403.06766 [pdf, other]

Determination of the number of $ψ(3686)$ events taken at BESIII

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (639 additional authors not shown)

Abstract: The number of $ψ(3686)$ events collected by the BESIII detector during the 2021 run period is determined to be $(2259.3\pm 11.1)\times 10^6$ by counting inclusive $ψ(3686)$ hadronic events. The uncertainty is systematic and the statistical uncertainty is negligible. Meanwhile, the numbers of $ψ(3686)$ events collected during the 2009 and 2012 run periods are updated to be… ▽ More The number of $ψ(3686)$ events collected by the BESIII detector during the 2021 run period is determined to be $(2259.3\pm 11.1)\times 10^6$ by counting inclusive $ψ(3686)$ hadronic events. The uncertainty is systematic and the statistical uncertainty is negligible. Meanwhile, the numbers of $ψ(3686)$ events collected during the 2009 and 2012 run periods are updated to be $(107.7\pm0.6)\times 10^6$ and $(345.4\pm 2.6)\times 10^6$, respectively. Both numbers are consistent with the previous measurements within one standard deviation. The total number of $ψ(3686)$ events in the three data samples is $(2712.4\pm14.3)\times10^6$. △ Less

Submitted 28 May, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

arXiv:2403.06394 [pdf, other]

FSViewFusion: Few-Shots View Generation of Novel Objects

Authors: Rukhshanda Hussain, Hui Xian Grace Lim, Borchun Chen, Mubarak Shah, Ser Nam Lim

Abstract: Novel view synthesis has observed tremendous developments since the arrival of NeRFs. However, Nerf models overfit on a single scene, lacking generalization to out of distribution objects. Recently, diffusion models have exhibited remarkable performance on introducing generalization in view synthesis. Inspired by these advancements, we explore the capabilities of a pretrained stable diffusion mode… ▽ More Novel view synthesis has observed tremendous developments since the arrival of NeRFs. However, Nerf models overfit on a single scene, lacking generalization to out of distribution objects. Recently, diffusion models have exhibited remarkable performance on introducing generalization in view synthesis. Inspired by these advancements, we explore the capabilities of a pretrained stable diffusion model for view synthesis without explicit 3D priors. Specifically, we base our method on a personalized text to image model, Dreambooth, given its strong ability to adapt to specific novel objects with a few shots. Our research reveals two interesting findings. First, we observe that Dreambooth can learn the high level concept of a view, compared to arguably more complex strategies which involve finetuning diffusions on large amounts of multi-view data. Second, we establish that the concept of a view can be disentangled and transferred to a novel object irrespective of the original object's identify from which the views are learnt. Motivated by this, we introduce a learning strategy, FSViewFusion, which inherits a specific view through only one image sample of a single scene, and transfers the knowledge to a novel object, learnt from few shots, using low rank adapters. Through extensive experiments we demonstrate that our method, albeit simple, is efficient in generating reliable view samples for in the wild images. Code and models will be released. △ Less

Submitted 12 March, 2024; v1 submitted 10 March, 2024; originally announced March 2024.

arXiv:2403.05406 [pdf, other]

Considering Nonstationary within Multivariate Time Series with Variational Hierarchical Transformer for Forecasting

Authors: Muyao Wang, Wenchao Chen, Bo Chen

Abstract: The forecasting of Multivariate Time Series (MTS) has long been an important but challenging task. Due to the non-stationary problem across long-distance time steps, previous studies primarily adopt stationarization method to attenuate the non-stationary problem of the original series for better predictability. However, existing methods always adopt the stationarized series, which ignores the inhe… ▽ More The forecasting of Multivariate Time Series (MTS) has long been an important but challenging task. Due to the non-stationary problem across long-distance time steps, previous studies primarily adopt stationarization method to attenuate the non-stationary problem of the original series for better predictability. However, existing methods always adopt the stationarized series, which ignores the inherent non-stationarity, and has difficulty in modeling MTS with complex distributions due to the lack of stochasticity. To tackle these problems, we first develop a powerful hierarchical probabilistic generative module to consider the non-stationarity and stochastic characteristics within MTS, and then combine it with transformer for a well-defined variational generative dynamic model named Hierarchical Time series Variational Transformer (HTV-Trans), which recovers the intrinsic non-stationary information into temporal dependencies. Being a powerful probabilistic model, HTV-Trans is utilized to learn expressive representations of MTS and applied to forecasting tasks. Extensive experiments on diverse datasets show the efficiency of HTV-Trans on MTS forecasting tasks △ Less

Submitted 8 March, 2024; originally announced March 2024.

Comments: accepted by AAAI2024

arXiv:2403.05283 [pdf]

Closely piling up of multiple adhesive fronts in adhesive friction due to re-attachment

Authors: Puyu Cao, Meicheng Yao, Bin Chen

Abstract: To understand why the adhesive frictional force was in linear proportion to the real contact area in experiments, we investigate the adhesive friction generated by sliding elastic solids adhered to a rigid surface via multiple adhesive springs. Our results indicate that the shear-off force of the interface increases with the energetically guided re-attachment rate of adhesive springs, reaching sat… ▽ More To understand why the adhesive frictional force was in linear proportion to the real contact area in experiments, we investigate the adhesive friction generated by sliding elastic solids adhered to a rigid surface via multiple adhesive springs. Our results indicate that the shear-off force of the interface increases with the energetically guided re-attachment rate of adhesive springs, reaching saturation at high re-attachment rates. Remarkably, this shear-off force can surpass the predictions made by the fracture theory. By plotting the adhesive forces along the interface, we observe substantial high adhesive forces distributed throughout the interface, based on which we identify multiple adhesive fronts closely piling up along the interface. These regions can exhibit similar force profiles, and their number appears to increase with the size of the interface, leading to a linear increase in the calculated shear-off force with the size of the interface. We then suggest that multiple adhesive fronts closely pile up to back up each other in adhesive friction due to re-attachments, which may provide profound insights into understanding the observed phenomena associated with adhesive friction along an interface. △ Less

Submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.05067 [pdf, other]

Prediction of turbulent energy based on low-rank resolvent modes and machine learning

Authors: Yitong Fan, Bo Chen, Weipeng Li

Abstract: A modelling framework based on the resolvent analysis and machine learning is proposed to predict the turbulent energy in incompressible channel flows. In the framework, the optimal resolvent response modes are selected as the basis functions modelling the low-rank behaviour of high-dimensional nonlinear turbulent flow-fields, and the corresponding weight functions are determined by data-driven ne… ▽ More A modelling framework based on the resolvent analysis and machine learning is proposed to predict the turbulent energy in incompressible channel flows. In the framework, the optimal resolvent response modes are selected as the basis functions modelling the low-rank behaviour of high-dimensional nonlinear turbulent flow-fields, and the corresponding weight functions are determined by data-driven neural networks. Turbulent-energy distribution in space and scales, at the friction Reynolds number 1000, is predicted and compared to the data of direct numerical simulation. Close agreement is observed, suggesting the feasibility and reliability of the proposed framework for turbulence prediction. △ Less

Submitted 8 March, 2024; originally announced March 2024.

Comments: 13 pages, 8 figures

arXiv:2403.04797 [pdf, other]

Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding

Authors: Zhenyu Zhang, Run** Chen, Shiwei Liu, Zhewei Yao, Olatunji Ruwase, Beidi Chen, Xiaoxia Wu, Zhangyang Wang

Abstract: This paper aims to overcome the "lost-in-the-middle" challenge of large language models (LLMs). While recent advancements have successfully enabled LLMs to perform stable language modeling with up to 4 million tokens, the persistent difficulty faced by most LLMs in identifying relevant information situated in the middle of the context has not been adequately tackled. To address this problem, this… ▽ More This paper aims to overcome the "lost-in-the-middle" challenge of large language models (LLMs). While recent advancements have successfully enabled LLMs to perform stable language modeling with up to 4 million tokens, the persistent difficulty faced by most LLMs in identifying relevant information situated in the middle of the context has not been adequately tackled. To address this problem, this paper introduces Multi-scale Positional Encoding (Ms-PoE) which is a simple yet effective plug-and-play approach to enhance the capacity of LLMs to handle the relevant information located in the middle of the context, without fine-tuning or introducing any additional overhead. Ms-PoE leverages the position indice rescaling to relieve the long-term decay effect introduced by RoPE, while meticulously assigning distinct scaling ratios to different attention heads to preserve essential knowledge learned during the pre-training step, forming a multi-scale context fusion from short to long distance. Extensive experiments with a wide range of LLMs demonstrate the efficacy of our approach. Notably, Ms-PoE achieves an average accuracy gain of up to 3.8 on the Zero-SCROLLS benchmark over the original LLMs. Code are available at https://github.com/VITA-Group/Ms-PoE. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2403.04652 [pdf, other]

Yi: Open Foundation Models by 01.AI

Authors: 01. AI, :, Alex Young, Bei Chen, Chao Li, Chengen Huang, Ge Zhang, Guanwei Zhang, Heng Li, Jiangcheng Zhu, Jianqun Chen, **g Chang, Kaidong Yu, Peng Liu, Qiang Liu, Shawn Yue, Senbin Yang, Shiming Yang, Tao Yu, Wen Xie, Wenhao Huang, Xiaohui Hu, Xiaoyi Ren, Xinyao Niu, Pengcheng Nie , et al. (7 additional authors not shown)

Abstract: We introduce the Yi model family, a series of language and multimodal models that demonstrate strong multi-dimensional capabilities. The Yi model family is based on 6B and 34B pretrained language models, then we extend them to chat models, 200K long context models, depth-upscaled models, and vision-language models. Our base models achieve strong performance on a wide range of benchmarks like MMLU,… ▽ More We introduce the Yi model family, a series of language and multimodal models that demonstrate strong multi-dimensional capabilities. The Yi model family is based on 6B and 34B pretrained language models, then we extend them to chat models, 200K long context models, depth-upscaled models, and vision-language models. Our base models achieve strong performance on a wide range of benchmarks like MMLU, and our finetuned chat models deliver strong human preference rate on major evaluation platforms like AlpacaEval and Chatbot Arena. Building upon our scalable super-computing infrastructure and the classical transformer architecture, we attribute the performance of Yi models primarily to its data quality resulting from our data-engineering efforts. For pretraining, we construct 3.1 trillion tokens of English and Chinese corpora using a cascaded data deduplication and quality filtering pipeline. For finetuning, we polish a small scale (less than 10K) instruction dataset over multiple iterations such that every single instance has been verified directly by our machine learning engineers. For vision-language, we combine the chat language model with a vision transformer encoder and train the model to align visual representations to the semantic space of the language model. We further extend the context length to 200K through lightweight continual pretraining and demonstrate strong needle-in-a-haystack retrieval performance. We show that extending the depth of the pretrained checkpoint through continual pretraining further improves performance. We believe that given our current results, continuing to scale up model parameters using thoroughly optimized data will lead to even stronger frontier models. △ Less

Submitted 7 March, 2024; originally announced March 2024.

arXiv:2403.03715 [pdf, other]

MeaCap: Memory-Augmented Zero-shot Image Captioning

Authors: Zequn Zeng, Yan Xie, Hao Zhang, Chiyu Chen, Zhengjue Wang, Bo Chen

Abstract: Zero-shot image captioning (IC) without well-paired image-text data can be divided into two categories, training-free and text-only-training. Generally, these two types of methods realize zero-shot IC by integrating pretrained vision-language models like CLIP for image-text similarity evaluation and a pre-trained language model (LM) for caption generation. The main difference between them is wheth… ▽ More Zero-shot image captioning (IC) without well-paired image-text data can be divided into two categories, training-free and text-only-training. Generally, these two types of methods realize zero-shot IC by integrating pretrained vision-language models like CLIP for image-text similarity evaluation and a pre-trained language model (LM) for caption generation. The main difference between them is whether using a textual corpus to train the LM. Though achieving attractive performance w.r.t. some metrics, existing methods often exhibit some common drawbacks. Training-free methods tend to produce hallucinations, while text-only-training often lose generalization capability. To move forward, in this paper, we propose a novel Memory-Augmented zero-shot image Captioning framework (MeaCap). Specifically, equipped with a textual memory, we introduce a retrieve-then-filter module to get key concepts that are highly related to the image. By deploying our proposed memory-augmented visual-related fusion score in a keywords-to-sentence LM, MeaCap can generate concept-centered captions that keep high consistency with the image with fewer hallucinations and more world-knowledge. The framework of MeaCap achieves the state-of-the-art performance on a series of zero-shot IC settings. Our code is available at https://github.com/joeyz0z/MeaCap. △ Less

Submitted 6 March, 2024; originally announced March 2024.

Comments: Accepted by CVPR2024

arXiv:2403.03689 [pdf, other]

General2Specialized LLMs Translation for E-commerce

Authors: Kaidi Chen, Ben Chen, Dehong Gao, Huangyu Dai, Wen Jiang, Wei Ning, Shanqing Yu, Libin Yang, Xiaoyan Cai

Abstract: Existing Neural Machine Translation (NMT) models mainly handle translation in the general domain, while overlooking domains with special writing formulas, such as e-commerce and legal documents. Taking e-commerce as an example, the texts usually include amounts of domain-related words and have more grammar problems, which leads to inferior performances of current NMT methods. To address these prob… ▽ More Existing Neural Machine Translation (NMT) models mainly handle translation in the general domain, while overlooking domains with special writing formulas, such as e-commerce and legal documents. Taking e-commerce as an example, the texts usually include amounts of domain-related words and have more grammar problems, which leads to inferior performances of current NMT methods. To address these problems, we collect two domain-related resources, including a set of term pairs (aligned Chinese-English bilingual terms) and a parallel corpus annotated for the e-commerce domain. Furthermore, we propose a two-step fine-tuning paradigm (named G2ST) with self-contrastive semantic enhancement to transfer one general NMT model to the specialized NMT model for e-commerce. The paradigm can be used for the NMT models based on Large language models (LLMs). Extensive evaluations on real e-commerce titles demonstrate the superior translation quality and robustness of our G2ST approach, as compared with state-of-the-art NMT models such as LLaMA, Qwen, GPT-3.5, and even GPT-4. △ Less

Submitted 6 April, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

Comments: 4 pages, 1 figure, WWW2024 accepted

arXiv:2403.03536 [pdf, other]

Towards Efficient and Effective Unlearning of Large Language Models for Recommendation

Authors: Hangyu Wang, Jianghao Lin, Bo Chen, Yang Yang, Ruiming Tang, Weinan Zhang, Yong Yu

Abstract: The significant advancements in large language models (LLMs) give rise to a promising research direction, i.e., leveraging LLMs as recommenders (LLMRec). The efficacy of LLMRec arises from the open-world knowledge and reasoning capabilities inherent in LLMs. LLMRec acquires the recommendation capabilities through instruction tuning based on user interaction data. However, in order to protect user… ▽ More The significant advancements in large language models (LLMs) give rise to a promising research direction, i.e., leveraging LLMs as recommenders (LLMRec). The efficacy of LLMRec arises from the open-world knowledge and reasoning capabilities inherent in LLMs. LLMRec acquires the recommendation capabilities through instruction tuning based on user interaction data. However, in order to protect user privacy and optimize utility, it is also crucial for LLMRec to intentionally forget specific user data, which is generally referred to as recommendation unlearning. In the era of LLMs, recommendation unlearning poses new challenges for LLMRec in terms of \textit{inefficiency} and \textit{ineffectiveness}. Existing unlearning methods require updating billions of parameters in LLMRec, which is costly and time-consuming. Besides, they always impact the model utility during the unlearning process. To this end, we propose \textbf{E2URec}, the first \underline{E}fficient and \underline{E}ffective \underline{U}nlearning method for LLM\underline{Rec}. Our proposed E2URec enhances the unlearning efficiency by updating only a few additional LoRA parameters, and improves the unlearning effectiveness by employing a teacher-student framework, where we maintain multiple teacher networks to guide the unlearning process. Extensive experiments show that E2URec outperforms state-of-the-art baselines on two real-world datasets. Specifically, E2URec can efficiently forget specific data without affecting recommendation performance. The source code is at \url{https://github.com/justarter/E2URec}. △ Less

Submitted 30 June, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

Comments: Accepted by Frontier of Computer Science

arXiv:2403.03507 [pdf, other]

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Authors: Jiawei Zhao, Zhenyu Zhang, Beidi Chen, Zhangyang Wang, Anima Anandkumar, Yuandong Tian

Abstract: Training Large Language Models (LLMs) presents significant memory challenges, predominantly due to the growing size of weights and optimizer states. Common memory-reduction approaches, such as low-rank adaptation (LoRA), add a trainable low-rank matrix to the frozen pre-trained weight in each layer, reducing trainable parameters and optimizer states. However, such approaches typically underperform… ▽ More Training Large Language Models (LLMs) presents significant memory challenges, predominantly due to the growing size of weights and optimizer states. Common memory-reduction approaches, such as low-rank adaptation (LoRA), add a trainable low-rank matrix to the frozen pre-trained weight in each layer, reducing trainable parameters and optimizer states. However, such approaches typically underperform training with full-rank weights in both pre-training and fine-tuning stages since they limit the parameter search to a low-rank subspace and alter the training dynamics, and further, may require full-rank warm start. In this work, we propose Gradient Low-Rank Projection (GaLore), a training strategy that allows full-parameter learning but is more memory-efficient than common low-rank adaptation methods such as LoRA. Our approach reduces memory usage by up to 65.5% in optimizer states while maintaining both efficiency and performance for pre-training on LLaMA 1B and 7B architectures with C4 dataset with up to 19.7B tokens, and on fine-tuning RoBERTa on GLUE tasks. Our 8-bit GaLore further reduces optimizer memory by up to 82.5% and total training memory by 63.3%, compared to a BF16 baseline. Notably, we demonstrate, for the first time, the feasibility of pre-training a 7B model on consumer GPUs with 24GB memory (e.g., NVIDIA RTX 4090) without model parallel, checkpointing, or offloading strategies. △ Less

Submitted 2 June, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

Comments: ICML 2024 (Oral)

arXiv:2403.03500 [pdf, other]

Observation of the decay $h_{c}\to3(π^{+}π^{-})π^{0}$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (639 additional authors not shown)

Abstract: Based on $(2712.4\pm14.1)\times10^{6}$ $ψ(3686)$ events collected with the BESIII detector, we study the decays $h_{c}\to3(π^{+}π^{-})π^{0}$, $h_{c}\to2(π^{+}π^{-})ω$, $h_{c}\to2(π^{+}π^{-})π^{0}η$, $h_{c}\to2(π^{+}π^{-})η$, and $h_{c}\to p\bar{p}$ via $ψ(3686)\toπ^{0}h_{c}$. The decay channel $h_{c}\to3(π^{+}π^{-})π^{0}$ is observed for the first time, and its branching fraction is determined to… ▽ More Based on $(2712.4\pm14.1)\times10^{6}$ $ψ(3686)$ events collected with the BESIII detector, we study the decays $h_{c}\to3(π^{+}π^{-})π^{0}$, $h_{c}\to2(π^{+}π^{-})ω$, $h_{c}\to2(π^{+}π^{-})π^{0}η$, $h_{c}\to2(π^{+}π^{-})η$, and $h_{c}\to p\bar{p}$ via $ψ(3686)\toπ^{0}h_{c}$. The decay channel $h_{c}\to3(π^{+}π^{-})π^{0}$ is observed for the first time, and its branching fraction is determined to be $\left( {9.28\pm 1.14 \pm 0.77} \right) \times {10^{ - 3}}$, where the first uncertainty is statistical and the second is systematic. In addition, first evidence is found for the modes $h_{c} \to 2(π^{+}π^{-})π^{0}η$ and $h_{c}\to2(π^{+}π^{-})ω$ with significances of 4.8$σ$ and 4.7$σ$, and their branching fractions are determined to be $(7.55\pm1.51\pm0.77)\times10^{-3}$ and $\left( {4.00 \pm 0.86 \pm 0.35}\right) \times {10^{ - 3}}$, respectively. No significant signals of $h_c\to 2(π^+π^-)η$ and $h_{c}\to p\bar{p}$ are observed, and the upper limits of the branching fractions of these decays are determined to be $<6.19\times10^{-4}$ and $<4.40\times10^{-5}$ at the 90% confidence level, respectively. △ Less

Submitted 6 March, 2024; originally announced March 2024.

Comments: 11 pages, 3 figures

arXiv:2403.03258 [pdf, other]

Interaction-driven Roton Condensation in C = 2/3 Fractional Quantum Anomalous Hall State

Authors: Hongyu Lu, Han-Qing Wu, Bin-Bin Chen, Kai Sun, Zi Yang Meng

Abstract: The interplay of topological order and charge order exhibits rich physics. Recent experiments that succesfully realized the frational quantum anomalous Hall (FQAH) effect in twisted MoTe$_2$ bilayers and rhombohedral multilayer graphene without external magnetic field further call for deeper understanding of the relation between topological order and charge order in quantum moiré materials. In the… ▽ More The interplay of topological order and charge order exhibits rich physics. Recent experiments that succesfully realized the frational quantum anomalous Hall (FQAH) effect in twisted MoTe$_2$ bilayers and rhombohedral multilayer graphene without external magnetic field further call for deeper understanding of the relation between topological order and charge order in quantum moiré materials. In the archetypal correlated flat-band model on checkerboard lattice, a FQAH smectic state with coexistent topological order and smectic charge order has been numerically discovered at filling $ν$ = 2/3. In this work, we explore the global ground-state phase diagram of the model with competing interactions and find a C = 2/3 FQAH phase surrounded by four different charge density wave (CDW) phases. In particular, we identify a FQAH-CDW transition triggered by roton condensation, in that, the minimal roton gap continues to decrease at the same finite momentum, along with the diverging density flucuations at the transition point, after which the system enters into a CDW metal phase with the same ordered wavevector. Our discovery points out that the charge-neutral roton modes can play a significant role in a transition from FQAH topological order to CDW symmetry-breaking order, discussed in FQH literature while severely neglected in FQAH systems. △ Less

Submitted 10 March, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

arXiv:2403.02950 [pdf, other]

A general approach to enhance the survivability of backdoor attacks by decision path coupling

Authors: Yufei Zhao, Dingji Wang, Bihuan Chen, Ziqian Chen, Xin Peng

Abstract: Backdoor attacks have been one of the emerging security threats to deep neural networks (DNNs), leading to serious consequences. One of the mainstream backdoor defenses is model reconstruction-based. Such defenses adopt model unlearning or pruning to eliminate backdoors. However, little attention has been paid to survive from such defenses. To bridge the gap, we propose Venom, the first generic ba… ▽ More Backdoor attacks have been one of the emerging security threats to deep neural networks (DNNs), leading to serious consequences. One of the mainstream backdoor defenses is model reconstruction-based. Such defenses adopt model unlearning or pruning to eliminate backdoors. However, little attention has been paid to survive from such defenses. To bridge the gap, we propose Venom, the first generic backdoor attack enhancer to improve the survivability of existing backdoor attacks against model reconstruction-based defenses. We formalize Venom as a binary-task optimization problem. The first is the original backdoor attack task to preserve the original attack capability, while the second is the attack enhancement task to improve the attack survivability. To realize the second task, we propose attention imitation loss to force the decision path of poisoned samples in backdoored models to couple with the crucial decision path of benign samples, which makes backdoors difficult to eliminate. Our extensive evaluation on two DNNs and three datasets has demonstrated that Venom significantly improves the survivability of eight state-of-the-art attacks against eight state-of-the-art defenses without impacting the capability of the original attacks. △ Less

Submitted 5 March, 2024; originally announced March 2024.

arXiv:2403.02828 [pdf, other]

A chip-integrated comb-based microwave oscillator

Authors: Wei Sun, Zhiyang Chen, Linze Li, Chen Shen, **bao Long, Huamin Zheng, Luyu Yang, Qiushi Chen, Zhouze Zhang, Baoqi Shi, Shichang Li, Lan Gao, Yi-Han Luo, Baile Chen, Junqiu Liu

Abstract: Low-noise microwave oscillators are cornerstones for wireless communication, radar and clocks. Optical frequency combs have enabled photonic microwaves with unrivalled noise performance and bandwidth. Emerging interest is to generate microwaves using chip-based frequency combs, namely microcombs. Here, we demonstrate the first, fully integrated, microcomb-based, microwave oscillator chip. The chip… ▽ More Low-noise microwave oscillators are cornerstones for wireless communication, radar and clocks. Optical frequency combs have enabled photonic microwaves with unrivalled noise performance and bandwidth. Emerging interest is to generate microwaves using chip-based frequency combs, namely microcombs. Here, we demonstrate the first, fully integrated, microcomb-based, microwave oscillator chip. The chip, powered by a microelectronic circuit, leverages hybrid integration of a DFB laser, a nonlinear microresonator, and a high-speed photodetector. Each component represents the best of its own class, yet allows large-volume manufacturing with low cost in CMOS foundries. The hybrid chip outputs an ultralow-noise laser of 6.9 Hz linewidth, a microcomb of 10.7 GHz repetition rate, and a 10.7 GHz microwave of 6.3 mHz linewidth -- all three in one entity of 76 mm$^2$ size.The microwave phase noise reaches -75/-105/-130 dBc/Hz at 1/10/100 kHz Fourier offset frequency. Our results can reinvigorate our information society for communication, sensing, timing and precision measurement. △ Less

Submitted 5 March, 2024; originally announced March 2024.

arXiv:2403.01792 [pdf, other]

ConSep: a Noise- and Reverberation-Robust Speech Separation Framework by Magnitude Conditioning

Authors: Kuan-Hsun Ho, Jeih-weih Hung, Berlin Chen

Abstract: Speech separation has recently made significant progress thanks to the fine-grained vision used in time-domain methods. However, several studies have shown that adopting Short-Time Fourier Transform (STFT) for feature extraction could be beneficial when encountering harsher conditions, such as noise or reverberation. Therefore, we propose a magnitude-conditioned time-domain framework, ConSep, to i… ▽ More Speech separation has recently made significant progress thanks to the fine-grained vision used in time-domain methods. However, several studies have shown that adopting Short-Time Fourier Transform (STFT) for feature extraction could be beneficial when encountering harsher conditions, such as noise or reverberation. Therefore, we propose a magnitude-conditioned time-domain framework, ConSep, to inherit the beneficial characteristics. The experiment shows that ConSep promotes performance in anechoic, noisy, and reverberant settings compared to two celebrated methods, SepFormer and Bi-Sep. Furthermore, we visualize the components of ConSep to strengthen the advantages and cohere with the actualities we have found in preliminary studies. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2403.01785 [pdf, other]

What do neural networks listen to? Exploring the crucial bands in Speech Enhancement using Sinc-convolution

Authors: Kuan-Hsun Ho, Jeih-weih Hung, Berlin Chen

Abstract: This study introduces a reformed Sinc-convolution (Sincconv) framework tailored for the encoder component of deep networks for speech enhancement (SE). The reformed Sincconv, based on parametrized sinc functions as band-pass filters, offers notable advantages in terms of training efficiency, filter diversity, and interpretability. The reformed Sinc-conv is evaluated in conjunction with various SE… ▽ More This study introduces a reformed Sinc-convolution (Sincconv) framework tailored for the encoder component of deep networks for speech enhancement (SE). The reformed Sincconv, based on parametrized sinc functions as band-pass filters, offers notable advantages in terms of training efficiency, filter diversity, and interpretability. The reformed Sinc-conv is evaluated in conjunction with various SE models, showcasing its ability to boost SE performance. Furthermore, the reformed Sincconv provides valuable insights into the specific frequency components that are prioritized in an SE scenario. This opens up a new direction of SE research and improving our knowledge of their operating dynamics. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2403.01761 [pdf, other]

Observation of $ψ(3686)\to 3φ$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (645 additional authors not shown)

Abstract: Using $(2.712\pm0.014)\times 10^9$ $ψ(3686)$ events collected by the BESIII detector operating at the BEPCII collider, we report the first observation of $ψ(3686)\to 3φ$ decay with a significance larger than 10$σ$. The branching fraction of this decay is determined to be $(1.46\pm0.05\pm0.17)\times10^{-5}$, where the first uncertainty is statistical and the second is systematic. No significant str… ▽ More Using $(2.712\pm0.014)\times 10^9$ $ψ(3686)$ events collected by the BESIII detector operating at the BEPCII collider, we report the first observation of $ψ(3686)\to 3φ$ decay with a significance larger than 10$σ$. The branching fraction of this decay is determined to be $(1.46\pm0.05\pm0.17)\times10^{-5}$, where the first uncertainty is statistical and the second is systematic. No significant structure is observed in the $φφ$ invariant mass spectra. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2403.01710 [pdf, other]

Sensor-based Multi-Robot Coverage Control with Spatial Separation in Unstructured Environments

Authors: Xinyi Wang, Jiwen Xu, Chuanxiang Gao, Yizhou Chen, Jihan Zhang, Chenggang Wang, Ben M. Chen

Abstract: Multi-robot systems have increasingly become instrumental in tackling search and coverage problems. However, the challenge of optimizing task efficiency without compromising task success still persists, particularly in expansive, unstructured environments with dense obstacles. This paper presents an innovative, decentralized Voronoi-based approach for search and coverage to reactively navigate t… ▽ More Multi-robot systems have increasingly become instrumental in tackling search and coverage problems. However, the challenge of optimizing task efficiency without compromising task success still persists, particularly in expansive, unstructured environments with dense obstacles. This paper presents an innovative, decentralized Voronoi-based approach for search and coverage to reactively navigate these complexities while maintaining safety. This approach leverages the active sensing capabilities of multi-robot systems to supplement GIS (Geographic Information System), offering a more comprehensive and real-time understanding of the environment. Based on point cloud data, which is inherently non-convex and unstructured, this method efficiently generates collision-free Voronoi regions using only local sensing information through spatial decomposition and spherical mirroring techniques. Then, deadlock-aware guided map integrated with a gradient-optimized, centroid Voronoi-based coverage control policy, is constructed to improve efficiency by avoiding exhaustive searches and local sensing pitfalls. The effectiveness of our algorithm has been validated through extensive numerical simulations in high-fidelity environments, demonstrating significant improvements in both task success rate, coverage ratio, and task execution time compared with others. △ Less

Submitted 10 April, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

arXiv:2403.01154 [pdf, ps, other]

Boundedness of klt complements on Fano fibrations over surfaces

Authors: Bingyi Chen

Abstract: Let $(X,B)$ be an $ε$-lc pair of dimension $d$ with a closed point $x\in X$. Birkar conjectured that there is an effective Cartier divisor $H$ passing through $x$ such that $(X,B+tH)$ is lc near $x$, where $t$ is a positive real number depending only on $d,ε$. We prove that Birkar's conjecture is equivalent to Shokurov's conjecture on boundedness of klt complements on Fano fibrations and we confir… ▽ More Let $(X,B)$ be an $ε$-lc pair of dimension $d$ with a closed point $x\in X$. Birkar conjectured that there is an effective Cartier divisor $H$ passing through $x$ such that $(X,B+tH)$ is lc near $x$, where $t$ is a positive real number depending only on $d,ε$. We prove that Birkar's conjecture is equivalent to Shokurov's conjecture on boundedness of klt complements on Fano fibrations and we confirm Birkar's conjecture in dimension 2. As a corollary, we prove the boundedness of klt complements on Fano fibrations over surfaces. △ Less

Submitted 26 April, 2024; v1 submitted 2 March, 2024; originally announced March 2024.

Comments: Version 2, showed that Shokurov's conjecture implies Birkar's conjecture, so they are equivalent (see Theorem 1.6). Version 3, added Example 1.10 to indicate that the order O(ε^2) in Theorem 1.7 is optimal. arXiv admin note: text overlap with arXiv:1811.10709 by other authors

arXiv:2403.00985 [pdf, other]

Episodic energy release during the main- and post-impulsive phase of a solar flare

Authors: Yuqian Wei, Bin Chen, Sijie Yu, Haimin Wang, Yixian Zhang, Lindsay Glesener

Abstract: When and where the magnetic field energy is released and converted in eruptive solar flares remains an outstanding topic in solar physics. To shed light on this question, here we report multi-wavelength observations of a C9.4-class eruptive limb flare that occurred on 2017 August 20. The flare, accompanied by a magnetic flux rope eruption and a white light coronal mass ejection, features three pos… ▽ More When and where the magnetic field energy is released and converted in eruptive solar flares remains an outstanding topic in solar physics. To shed light on this question, here we report multi-wavelength observations of a C9.4-class eruptive limb flare that occurred on 2017 August 20. The flare, accompanied by a magnetic flux rope eruption and a white light coronal mass ejection, features three post-impulsive X-ray and microwave bursts immediately following its main impulsive phase. For each burst, both microwave and X-ray imaging suggest that the non-thermal electrons are located in the above-the-loop-top region. Interestingly, contrary to many other flares, the peak flux of the three post-impulsive microwave and X-ray bursts shows an increase for later bursts. Spectral analysis reveals that the sources have a hardening spectral index, suggesting a more efficient electron acceleration into the later post-impulsive bursts. We observe a positive correlation between the acceleration of the magnetic flux rope and the non-thermal energy release during the post-impulsive bursts in the same event. Intriguingly, different from some other eruptive events, this correlation does not hold for the main impulse phase of this event, which we interpret as energy release due to the tether-cutting reconnection before the primary flux rope acceleration occurs. In addition, using footpoint brightenings at conjugate flare ribbons, a weakening reconnection guide field is inferred, which may also contribute to the hardening of the non-thermal electrons during the post-impulsive phase. △ Less

Submitted 1 March, 2024; originally announced March 2024.

Comments: 21 pages, 14 figures

arXiv:2402.19248 [pdf, other]

Let LLMs Take on the Latest Challenges! A Chinese Dynamic Question Answering Benchmark

Authors: Zhikun Xu, Yinghui Li, Ruixue Ding, Xinyu Wang, Boli Chen, Yong Jiang, Hai-Tao Zheng, Wenlian Lu, Pengjun Xie, Fei Huang

Abstract: How to better evaluate the capabilities of Large Language Models (LLMs) is the focal point and hot topic in current LLMs research. Previous work has noted that due to the extremely high cost of iterative updates of LLMs, they are often unable to answer the latest dynamic questions well. To promote the improvement of Chinese LLMs' ability to answer dynamic questions, in this paper, we introduce CDQ… ▽ More How to better evaluate the capabilities of Large Language Models (LLMs) is the focal point and hot topic in current LLMs research. Previous work has noted that due to the extremely high cost of iterative updates of LLMs, they are often unable to answer the latest dynamic questions well. To promote the improvement of Chinese LLMs' ability to answer dynamic questions, in this paper, we introduce CDQA, a Chinese Dynamic QA benchmark containing question-answer pairs related to the latest news on the Chinese Internet. We obtain high-quality data through a pipeline that combines humans and models, and carefully classify the samples according to the frequency of answer changes to facilitate a more fine-grained observation of LLMs' capabilities. We have also evaluated and analyzed mainstream and advanced Chinese LLMs on CDQA. Extensive experiments and valuable insights suggest that our proposed CDQA is challenging and worthy of more further study. We believe that the benchmark we provide will become one of the key data resources for improving LLMs' Chinese question-answering ability in the future. △ Less

Submitted 1 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

Comments: Work in progress!

arXiv:2402.18951 [pdf, other]

Percept, Chat, and then Adapt: Multimodal Knowledge Transfer of Foundation Models for Open-World Video Recognition

Authors: Boyu Chen, Siran Chen, Kunchang Li, Qinglin Xu, Yu Qiao, Yali Wang

Abstract: Open-world video recognition is challenging since traditional networks are not generalized well on complex environment variations. Alternatively, foundation models with rich knowledge have recently shown their generalization power. However, how to apply such knowledge has not been fully explored for open-world video recognition. To this end, we propose a generic knowledge transfer pipeline, which… ▽ More Open-world video recognition is challenging since traditional networks are not generalized well on complex environment variations. Alternatively, foundation models with rich knowledge have recently shown their generalization power. However, how to apply such knowledge has not been fully explored for open-world video recognition. To this end, we propose a generic knowledge transfer pipeline, which progressively exploits and integrates external multimodal knowledge from foundation models to boost open-world video recognition. We name it PCA, based on three stages of Percept, Chat, and Adapt. First, we perform Percept process to reduce the video domain gap and obtain external visual knowledge. Second, we generate rich linguistic semantics as external textual knowledge in Chat stage. Finally, we blend external multimodal knowledge in Adapt stage, by inserting multimodal knowledge adaptation modules into networks. We conduct extensive experiments on three challenging open-world video benchmarks, i.e., TinyVIRAT, ARID, and QV-Pipe. Our approach achieves state-of-the-art performance on all three datasets. △ Less

Submitted 29 February, 2024; originally announced February 2024.

Comments: 35 pages, 6 figures, 8 tables

arXiv:2402.17913 [pdf, other]

Using AI libraries for Incompressible Computational Fluid Dynamics

Authors: Boyang Chen, Claire E. Heaney, Christopher C. Pain

Abstract: Recently, there has been a huge effort focused on develo** highly efficient open source libraries to perform Artificial Intelligence (AI) related computations on different computer architectures (for example, CPUs, GPUs and new AI processors). This has not only made the algorithms based on these libraries highly efficient and portable between different architectures, but also has substantially s… ▽ More Recently, there has been a huge effort focused on develo** highly efficient open source libraries to perform Artificial Intelligence (AI) related computations on different computer architectures (for example, CPUs, GPUs and new AI processors). This has not only made the algorithms based on these libraries highly efficient and portable between different architectures, but also has substantially simplified the entry barrier to develop methods using AI. Here, we present a novel methodology to bring the power of both AI software and hardware into the field of numerical modelling by repurposing AI methods, such as Convolutional Neural Networks (CNNs), for the standard operations required in the field of the numerical solution of Partial Differential Equations (PDEs). The aim of this work is to bring the high performance, architecture agnosticism and ease of use into the field of the numerical solution of PDEs. We use the proposed methodology to solve the advection-diffusion equation, the non-linear Burgers equation and incompressible flow past a bluff body. For the latter, a convolutional neural network is used as a multigrid solver in order to enforce the incompressibility constraint. We show that the presented methodology can solve all these problems using repurposed AI libraries in an efficient way, and presents a new avenue to explore in the development of methods to solve PDEs and Computational Fluid Dynamics problems with implicit methods. △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: 24 pages, 6 figures

arXiv:2402.17189 [pdf]

An Effective Mixture-Of-Experts Approach For Code-Switching Speech Recognition Leveraging Encoder Disentanglement

Authors: Tzu-Ting Yang, Hsin-Wei Wang, Yi-Cheng Wang, Chi-Han Lin, Berlin Chen

Abstract: With the massive developments of end-to-end (E2E) neural networks, recent years have witnessed unprecedented breakthroughs in automatic speech recognition (ASR). However, the codeswitching phenomenon remains a major obstacle that hinders ASR from perfection, as the lack of labeled data and the variations between languages often lead to degradation of ASR performance. In this paper, we focus exclus… ▽ More With the massive developments of end-to-end (E2E) neural networks, recent years have witnessed unprecedented breakthroughs in automatic speech recognition (ASR). However, the codeswitching phenomenon remains a major obstacle that hinders ASR from perfection, as the lack of labeled data and the variations between languages often lead to degradation of ASR performance. In this paper, we focus exclusively on improving the acoustic encoder of E2E ASR to tackle the challenge caused by the codeswitching phenomenon. Our main contributions are threefold: First, we introduce a novel disentanglement loss to enable the lower-layer of the encoder to capture inter-lingual acoustic information while mitigating linguistic confusion at the higher-layer of the encoder. Second, through comprehensive experiments, we verify that our proposed method outperforms the prior-art methods using pretrained dual-encoders, meanwhile having access only to the codeswitching corpus and consuming half of the parameterization. Third, the apparent differentiation of the encoders' output features also corroborates the complementarity between the disentanglement loss and the mixture-of-experts (MoE) architecture. △ Less

Submitted 26 February, 2024; originally announced February 2024.

Comments: ICASSP 2024

arXiv:2402.16855 [pdf, other]

MB-RACS: Measurement-Bounds-based Rate-Adaptive Image Compressed Sensing Network

Authors: Yujun Huang, Bin Chen, Naiqi Li, Baoyi An, Shu-Tao Xia, Yaowei Wang

Abstract: Conventional compressed sensing (CS) algorithms typically apply a uniform sampling rate to different image blocks. A more strategic approach could be to allocate the number of measurements adaptively, based on each image block's complexity. In this paper, we propose a Measurement-Bounds-based Rate-Adaptive Image Compressed Sensing Network (MB-RACS) framework, which aims to adaptively determine the… ▽ More Conventional compressed sensing (CS) algorithms typically apply a uniform sampling rate to different image blocks. A more strategic approach could be to allocate the number of measurements adaptively, based on each image block's complexity. In this paper, we propose a Measurement-Bounds-based Rate-Adaptive Image Compressed Sensing Network (MB-RACS) framework, which aims to adaptively determine the sampling rate for each image block in accordance with traditional measurement bounds theory. Moreover, since in real-world scenarios statistical information about the original image cannot be directly obtained, we suggest a multi-stage rate-adaptive sampling strategy. This strategy sequentially adjusts the sampling ratio allocation based on the information gathered from previous samplings. We formulate the multi-stage rate-adaptive sampling as a convex optimization problem and address it using a combination of Newton's method and binary search techniques. Additionally, we enhance our decoding process by incorporating skip connections between successive iterations to facilitate a richer transmission of feature information across iterations. Our experiments demonstrate that the proposed MB-RACS method surpasses current leading methods, with experimental evidence also underscoring the effectiveness of each module within our proposed framework. △ Less

Submitted 18 January, 2024; originally announced February 2024.

Showing 201–250 of 3,237 results for author: Chen, B