Search | arXiv e-print repository

LongSkywork: A Training Recipe for Efficiently Extending Context Length in Large Language Models

Authors: Liang Zhao, Tianwen Wei, Liang Zeng, Cheng Cheng, Liu Yang, Peng Cheng, Lijie Wang, Chenxia Li, Xuejie Wu, Bo Zhu, Yimeng Gan, Rui Hu, Shuicheng Yan, Han Fang, Yahui Zhou

Abstract: We introduce LongSkywork, a long-context Large Language Model (LLM) capable of processing up to 200,000 tokens. We provide a training recipe for efficiently extending context length of LLMs. We identify that the critical element in enhancing long-context processing capability is to incorporate a long-context SFT stage following the standard SFT stage. A mere 200 iterations can convert the standard… ▽ More We introduce LongSkywork, a long-context Large Language Model (LLM) capable of processing up to 200,000 tokens. We provide a training recipe for efficiently extending context length of LLMs. We identify that the critical element in enhancing long-context processing capability is to incorporate a long-context SFT stage following the standard SFT stage. A mere 200 iterations can convert the standard SFT model into a long-context model. To reduce the effort in collecting and annotating data for long-context language modeling, we develop two novel methods for creating synthetic data. These methods are applied during the continual pretraining phase as well as the Supervised Fine-Tuning (SFT) phase, greatly enhancing the training efficiency of our long-context LLMs. Our findings suggest that synthetic long-context SFT data can surpass the performance of data curated by humans to some extent. LongSkywork achieves outstanding performance on a variety of long-context benchmarks. In the Needle test, a benchmark for long-context information retrieval, our models achieved perfect accuracy across multiple context spans. Moreover, in realistic application scenarios, LongSkywork-13B demonstrates performance on par with Claude2.1, the leading long-context model, underscoring the effectiveness of our proposed methods. △ Less

Submitted 1 June, 2024; originally announced June 2024.

arXiv:2406.00415 [pdf, other]

Neural Combinatorial Optimization Algorithms for Solving Vehicle Routing Problems: A Comprehensive Survey with Perspectives

Authors: Xuan Wu, Di Wang, Lijie Wen, Yubin Xiao, Chunguo Wu, Yuesong Wu, Chaoyu Yu, Douglas L. Maskell, You Zhou

Abstract: Although several surveys on Neural Combinatorial Optimization (NCO) solvers specifically designed to solve Vehicle Routing Problems (VRPs) have been conducted. These existing surveys did not cover the state-of-the-art (SOTA) NCO solvers emerged recently. More importantly, to provide a comprehensive taxonomy of NCO solvers with up-to-date coverage, based on our thorough review of relevant publicati… ▽ More Although several surveys on Neural Combinatorial Optimization (NCO) solvers specifically designed to solve Vehicle Routing Problems (VRPs) have been conducted. These existing surveys did not cover the state-of-the-art (SOTA) NCO solvers emerged recently. More importantly, to provide a comprehensive taxonomy of NCO solvers with up-to-date coverage, based on our thorough review of relevant publications and preprints, we divide all NCO solvers into four distinct categories, namely Learning to Construct, Learning to Improve, Learning to Predict-Once, and Learning to Predict-Multiplicity solvers. Subsequently, we present the inadequacies of the SOTA solvers, including poor generalization, incapability to solve large-scale VRPs, inability to address most types of VRP variants simultaneously, and difficulty in comparing these NCO solvers with the conventional Operations Research algorithms. Simultaneously, we propose promising and viable directions to overcome these inadequacies. In addition, we compare the performance of representative NCO solvers from the Reinforcement, Supervised, and Unsupervised Learning paradigms across both small- and large-scale VRPs. Finally, following the proposed taxonomy, we provide an accompanying web page as a live repository for NCO solvers. Through this survey and the live repository, we hope to make the research community of NCO solvers for VRPs more thriving. △ Less

Submitted 1 June, 2024; originally announced June 2024.

arXiv:2405.20881 [pdf, other]

S4Fusion: Saliency-aware Selective State Space Model for Infrared Visible Image Fusion

Authors: Haolong Ma, Hui Li, Chunyang Cheng, Gaoang Wang, Xiaoning Song, Xiaojun Wu

Abstract: As one of the tasks in Image Fusion, Infrared and Visible Image Fusion aims to integrate complementary information captured by sensors of different modalities into a single image. The Selective State Space Model (SSSM), known for its ability to capture long-range dependencies, has demonstrated its potential in the field of computer vision. However, in image fusion, current methods underestimate th… ▽ More As one of the tasks in Image Fusion, Infrared and Visible Image Fusion aims to integrate complementary information captured by sensors of different modalities into a single image. The Selective State Space Model (SSSM), known for its ability to capture long-range dependencies, has demonstrated its potential in the field of computer vision. However, in image fusion, current methods underestimate the potential of SSSM in capturing the global spatial information of both modalities. This limitation prevents the simultaneous consideration of the global spatial information from both modalities during interaction, leading to a lack of comprehensive perception of salient targets. Consequently, the fusion results tend to bias towards one modality instead of adaptively preserving salient targets. To address this issue, we propose the Saliency-aware Selective State Space Fusion Model (S4Fusion). In our S4Fusion, the designed Cross-Modal Spatial Awareness Module (CMSA) can simultaneously focus on global spatial information from both modalities while facilitating their interaction, thereby comprehensively capturing complementary information. Additionally, S4Fusion leverages a pre-trained network to perceive uncertainty in the fused images. By minimizing this uncertainty, S4Fusion adaptively highlights salient targets from both images. Extensive experiments demonstrate that our approach produces high-quality images and enhances performance in downstream tasks. △ Less

Submitted 3 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

arXiv:2405.20849 [pdf, ps, other]

Locally Stationary Distributions: A Framework for Analyzing Slow-Mixing Markov Chains

Authors: Kuikui Liu, Sidhanth Mohanty, Prasad Raghavendra, Amit Rajaraman, David X. Wu

Abstract: Many natural Markov chains fail to mix to their stationary distribution in polynomially many steps. Often, this slow mixing is inevitable since it is computationally intractable to sample from their stationary measure. Nevertheless, Markov chains can be shown to always converge quickly to measures that are *locally stationary*, i.e., measures that don't change over a small number of steps. These… ▽ More Many natural Markov chains fail to mix to their stationary distribution in polynomially many steps. Often, this slow mixing is inevitable since it is computationally intractable to sample from their stationary measure. Nevertheless, Markov chains can be shown to always converge quickly to measures that are *locally stationary*, i.e., measures that don't change over a small number of steps. These locally stationary measures are analogous to local minima in continuous optimization, while stationary measures correspond to global minima. While locally stationary measures can be statistically far from stationary measures, do they enjoy provable theoretical guarantees that have algorithmic implications? We study this question in this work and demonstrate three algorithmic applications of locally stationary measures: 1. We show that Glauber dynamics on the hardcore model can be used to find independent sets of size $Ω\left(\frac{\log d}{d} \cdot n\right)$ in triangle-free graphs of degree at most $d$. 2. Let $W$ be a symmetric real matrix with bounded spectral diameter and $v$ be a unit vector. Given the matrix $M = λvv^\top + W$ with a planted rank-one spike along vector $v$, for sufficiently large constant $λ$, Glauber dynamics on the Ising model defined by $M$ samples vectors $x \in \{\pm 1\}^n$ that have constant correlation with the vector $v$. 3. Let $M = A_{\mathbf{G}} - \frac{d}{n}\mathbf{1}\mathbf{1}^\top$ be a centered version of the adjacency matrix where the graph $\mathbf{G}$ is drawn from a sparse 2-community stochastic block model. We show that for sufficiently large constant $λ$, Glauber dynamics on the Ising model defined by $M$ samples vectors $x \in \{\pm 1\}^n$ that have constant correlation with the hidden community vector $\mathbfσ$. △ Less

Submitted 31 May, 2024; originally announced May 2024.

Comments: 34 pages

arXiv:2405.20764 [pdf, other]

CoMoFusion: Fast and High-quality Fusion of Infrared and Visible Image with Consistency Model

Authors: Zhiming Meng, Hui Li, Zeyang Zhang, Zhongwei Shen, Yunlong Yu, Xiaoning Song, Xiaojun Wu

Abstract: Generative models are widely utilized to model the distribution of fused images in the field of infrared and visible image fusion. However, current generative models based fusion methods often suffer from unstable training and slow inference speed. To tackle this problem, a novel fusion method based on consistency model is proposed, termed as CoMoFusion, which can generate the high-quality images… ▽ More Generative models are widely utilized to model the distribution of fused images in the field of infrared and visible image fusion. However, current generative models based fusion methods often suffer from unstable training and slow inference speed. To tackle this problem, a novel fusion method based on consistency model is proposed, termed as CoMoFusion, which can generate the high-quality images and achieve fast image inference speed. In specific, the consistency model is used to construct multi-modal joint features in the latent space with the forward and reverse process. Then, the infrared and visible features extracted by the trained consistency model are fed into fusion module to generate the final fused image. In order to enhance the texture and salient information of fused images, a novel loss based on pixel value selection is also designed. Extensive experiments on public datasets illustrate that our method obtains the SOTA fusion performance compared with the existing fusion methods. △ Less

Submitted 11 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

arXiv:2405.20676 [pdf, other]

Search for $e^{+}e^{-}\toη'ψ(2S)$ at center-of-mass energies from 4.66 to 4.95 GeV

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (638 additional authors not shown)

Abstract: Using data samples with an integrated luminosity of $4.67~\mathrm{fb}^{-1}$ collected by the BESIII detector operating at the BEPCII collider, we search for the process $e^+e^- \rightarrow η' ψ(2S)$ at center-of-mass energies from $4.66$ to $4.95~\mathrm{GeV}$. No significant signal is observed, and upper limits for the Born cross sections $σ^B(e^+e^-\rightarrowη'ψ(2S))$ at the 90\% confidence lev… ▽ More Using data samples with an integrated luminosity of $4.67~\mathrm{fb}^{-1}$ collected by the BESIII detector operating at the BEPCII collider, we search for the process $e^+e^- \rightarrow η' ψ(2S)$ at center-of-mass energies from $4.66$ to $4.95~\mathrm{GeV}$. No significant signal is observed, and upper limits for the Born cross sections $σ^B(e^+e^-\rightarrowη'ψ(2S))$ at the 90\% confidence level are determined. △ Less

Submitted 31 May, 2024; originally announced May 2024.

arXiv:2405.20654 [pdf, other]

Passage-specific Prompt Tuning for Passage Reranking in Question Answering with Large Language Models

Authors: Xuyang Wu, Zhiyuan Peng, Krishna Sravanthi Rajanala Sai, Hsin-Tai Wu, Yi Fang

Abstract: Effective passage retrieval and reranking methods have been widely utilized to identify suitable candidates in open-domain question answering tasks, recent studies have resorted to LLMs for reranking the retrieved passages by the log-likelihood of the question conditioned on each passage. Although these methods have demonstrated promising results, the performance is notably sensitive to the human-… ▽ More Effective passage retrieval and reranking methods have been widely utilized to identify suitable candidates in open-domain question answering tasks, recent studies have resorted to LLMs for reranking the retrieved passages by the log-likelihood of the question conditioned on each passage. Although these methods have demonstrated promising results, the performance is notably sensitive to the human-written prompt (or hard prompt), and fine-tuning LLMs can be computationally intensive and time-consuming. Furthermore, this approach limits the leverage of question-passage relevance pairs and passage-specific knowledge to enhance the ranking capabilities of LLMs. In this paper, we propose passage-specific prompt tuning for reranking in open-domain question answering (PSPT): a parameter-efficient method that fine-tunes learnable passage-specific soft prompts, incorporating passage-specific knowledge from a limited set of question-passage relevance pairs. The method involves ranking retrieved passages based on the log-likelihood of the model generating the question conditioned on each passage and the learned soft prompt. We conducted extensive experiments utilizing the Llama-2-chat-7B model across three publicly available open-domain question answering datasets and the results demonstrate the effectiveness of the proposed approach. △ Less

Submitted 20 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

Comments: Accepted at Gen-IR@SIGIR24

arXiv:2405.20646 [pdf, other]

Large Language Models Enhanced Sequential Recommendation for Long-tail User and Item

Authors: Qidong Liu, Xian Wu, Xiangyu Zhao, Ye**g Wang, Zijian Zhang, Feng Tian, Yefeng Zheng

Abstract: Sequential recommendation systems (SRS) serve the purpose of predicting users' subsequent preferences based on their past interactions and have been applied across various domains such as e-commerce and social networking platforms. However, practical SRS encounters challenges due to the fact that most users engage with only a limited number of items, while the majority of items are seldom consumed… ▽ More Sequential recommendation systems (SRS) serve the purpose of predicting users' subsequent preferences based on their past interactions and have been applied across various domains such as e-commerce and social networking platforms. However, practical SRS encounters challenges due to the fact that most users engage with only a limited number of items, while the majority of items are seldom consumed. These challenges, termed as the long-tail user and long-tail item dilemmas, often create obstacles for traditional SRS methods. Mitigating these challenges is crucial as they can significantly impact user satisfaction and business profitability. While some research endeavors have alleviated these issues, they still grapple with issues such as seesaw or noise stemming from the scarcity of interactions. The emergence of large language models (LLMs) presents a promising avenue to address these challenges from a semantic standpoint. In this study, we introduce the Large Language Models Enhancement framework for Sequential Recommendation (LLM-ESR), which leverages semantic embeddings from LLMs to enhance SRS performance without increasing computational overhead. To combat the long-tail item challenge, we propose a dual-view modeling approach that fuses semantic information from LLMs with collaborative signals from traditional SRS. To address the long-tail user challenge, we introduce a retrieval augmented self-distillation technique to refine user preference representations by incorporating richer interaction data from similar users. Through comprehensive experiments conducted on three authentic datasets using three widely used SRS models, our proposed enhancement framework demonstrates superior performance compared to existing methodologies. △ Less

Submitted 31 May, 2024; originally announced May 2024.

arXiv:2405.20638 [pdf, other]

Study of the decays $χ_{cJ} \rightarrow Λ\barΛφ$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (637 additional authors not shown)

Abstract: Based on $(2712.4 \pm 14.3) \times 10^{6}$ $ e^{+}e^{-}\toψ(3686)$ events collected with the BESIII detector operating at the BEPCII collider, we report the first evidence of $χ_{c0}\to Λ\bar Λφ$ decays and the first observation of $χ_{c1,2}\to Λ\bar Λφ$ decays, with significances of $4.5σ$, $11.3σ$ and $13.0σ$, respectively. The decay branching fractions of $χ_{c0,1,2}\to Λ\bar Λφ$ are measured t… ▽ More Based on $(2712.4 \pm 14.3) \times 10^{6}$ $ e^{+}e^{-}\toψ(3686)$ events collected with the BESIII detector operating at the BEPCII collider, we report the first evidence of $χ_{c0}\to Λ\bar Λφ$ decays and the first observation of $χ_{c1,2}\to Λ\bar Λφ$ decays, with significances of $4.5σ$, $11.3σ$ and $13.0σ$, respectively. The decay branching fractions of $χ_{c0,1,2}\to Λ\bar Λφ$ are measured to be $( 2.99\pm1.24\pm0.19) \times 10^{-5}$, $(6.01\pm0.90\pm0.40 )\times 10^{-5}$, and $(7.13\pm0.81\pm0.36) \times 10^{-5}$, where the first uncertainties are statistical and the second systematic. No obvious enhancement near the $Λ\barΛ$ production threshold or excited $Λ$ state is found in the $Λφ$ (or $\barΛφ$) system. △ Less

Submitted 31 May, 2024; originally announced May 2024.

Comments: 10 pages, 9 figures

arXiv:2405.20607 [pdf, other]

Textual Inversion and Self-supervised Refinement for Radiology Report Generation

Authors: Yuanjiang Luo, Hongxiang Li, Xuan Wu, Meng Cao, Xiaoshuang Huang, Zhihong Zhu, Peixi Liao, Hu Chen, Yi Zhang

Abstract: Existing mainstream approaches follow the encoder-decoder paradigm for generating radiology reports. They focus on improving the network structure of encoders and decoders, which leads to two shortcomings: overlooking the modality gap and ignoring report content constraints. In this paper, we proposed Textual Inversion and Self-supervised Refinement (TISR) to address the above two issues. Specific… ▽ More Existing mainstream approaches follow the encoder-decoder paradigm for generating radiology reports. They focus on improving the network structure of encoders and decoders, which leads to two shortcomings: overlooking the modality gap and ignoring report content constraints. In this paper, we proposed Textual Inversion and Self-supervised Refinement (TISR) to address the above two issues. Specifically, textual inversion can project text and image into the same space by representing images as pseudo words to eliminate the cross-modeling gap. Subsequently, self-supervised refinement refines these pseudo words through contrastive loss computation between images and texts, enhancing the fidelity of generated reports to images. Notably, TISR is orthogonal to most existing methods, plug-and-play. We conduct experiments on two widely-used public datasets and achieve significant improvements on various baselines, which demonstrates the effectiveness and generalization of TISR. The code will be available soon. △ Less

Submitted 6 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

Comments: This paper has been early accepted by MICCAI 2024!

arXiv:2405.19931 [pdf, other]

Exploring Diffusion Models' Corruption Stage in Few-Shot Fine-tuning and Mitigating with Bayesian Neural Networks

Authors: Xiaoyu Wu, Jiaru Zhang, Yang Hua, Bohan Lyu, Hao Wang, Tao Song, Haibing Guan

Abstract: Few-shot fine-tuning of Diffusion Models (DMs) is a key advancement, significantly reducing training costs and enabling personalized AI applications. However, we explore the training dynamics of DMs and observe an unanticipated phenomenon: during the training process, image fidelity initially improves, then unexpectedly deteriorates with the emergence of noisy patterns, only to recover later with… ▽ More Few-shot fine-tuning of Diffusion Models (DMs) is a key advancement, significantly reducing training costs and enabling personalized AI applications. However, we explore the training dynamics of DMs and observe an unanticipated phenomenon: during the training process, image fidelity initially improves, then unexpectedly deteriorates with the emergence of noisy patterns, only to recover later with severe overfitting. We term the stage with generated noisy patterns as corruption stage. To understand this corruption stage, we begin by theoretically modeling the one-shot fine-tuning scenario, and then extend this modeling to more general cases. Through this modeling, we identify the primary cause of this corruption stage: a narrowed learning distribution inherent in the nature of few-shot fine-tuning. To tackle this, we apply Bayesian Neural Networks (BNNs) on DMs with variational inference to implicitly broaden the learned distribution, and present that the learning target of the BNNs can be naturally regarded as an expectation of the diffusion loss and a further regularization with the pretrained DMs. This approach is highly compatible with current few-shot fine-tuning methods in DMs and does not introduce any extra inference costs. Experimental results demonstrate that our method significantly mitigates corruption, and improves the fidelity, quality and diversity of the generated images in both object-driven and subject-driven generation tasks. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: Preprint. Under review

arXiv:2405.19846 [pdf, other]

Quest: Query-centric Data Synthesis Approach for Long-context Scaling of Large Language Model

Authors: Chaochen Gao, Xing Wu, Qi Fu, Songlin Hu

Abstract: Large language models, initially pre-trained with a limited context length, can better handle longer texts by continuing training on a corpus with extended contexts. However, obtaining effective long-context data is challenging due to the scarcity and uneven distribution of long documents across different domains. To address this issue, we propose a Query-centric data synthesis method, abbreviated… ▽ More Large language models, initially pre-trained with a limited context length, can better handle longer texts by continuing training on a corpus with extended contexts. However, obtaining effective long-context data is challenging due to the scarcity and uneven distribution of long documents across different domains. To address this issue, we propose a Query-centric data synthesis method, abbreviated as Quest. Quest is an interpretable method based on the observation that documents retrieved by similar queries are relevant but low-redundant, thus well-suited for synthesizing long-context data. The method is also scalable and capable of constructing large amounts of long-context data. Using Quest, we synthesize a long-context dataset up to 128k context length, significantly outperforming other data synthesis methods on multiple long-context benchmark datasets. In addition, we further verify that the Quest method is predictable through scaling law experiments, making it a reliable solution for advancing long-context models. △ Less

Submitted 19 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.19796 [pdf, other]

Explainable Attribute-Based Speaker Verification

Authors: Xiaoliang Wu, Chau Luu, Peter Bell, Ajitha Rajan

Abstract: This paper proposes a fully explainable approach to speaker verification (SV), a task that fundamentally relies on individual speaker characteristics. The opaque use of speaker attributes in current SV systems raises concerns of trust. Addressing this, we propose an attribute-based explainable SV system that identifies speakers by comparing personal attributes such as gender, nationality, and age… ▽ More This paper proposes a fully explainable approach to speaker verification (SV), a task that fundamentally relies on individual speaker characteristics. The opaque use of speaker attributes in current SV systems raises concerns of trust. Addressing this, we propose an attribute-based explainable SV system that identifies speakers by comparing personal attributes such as gender, nationality, and age extracted automatically from voice recordings. We believe this approach better aligns with human reasoning, making it more understandable than traditional methods. Evaluated on the Voxceleb1 test set, the best performance of our system is comparable with the ground truth established when using all correct attributes, proving its efficacy. Whilst our approach sacrifices some performance compared to non-explainable methods, we believe that it moves us closer to the goal of transparent, interpretable AI and lays the groundwork for future enhancements through attribute expansion. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.19789 [pdf, other]

Estimating before Debiasing: A Bayesian Approach to Detaching Prior Bias in Federated Semi-Supervised Learning

Authors: Guogang Zhu, Xuefeng Liu, Xinghao Wu, Shaojie Tang, Chao Tang, Jianwei Niu, Hao Su

Abstract: Federated Semi-Supervised Learning (FSSL) leverages both labeled and unlabeled data on clients to collaboratively train a model.In FSSL, the heterogeneous data can introduce prediction bias into the model, causing the model's prediction to skew towards some certain classes. Existing FSSL methods primarily tackle this issue by enhancing consistency in model parameters or outputs. However, as the mo… ▽ More Federated Semi-Supervised Learning (FSSL) leverages both labeled and unlabeled data on clients to collaboratively train a model.In FSSL, the heterogeneous data can introduce prediction bias into the model, causing the model's prediction to skew towards some certain classes. Existing FSSL methods primarily tackle this issue by enhancing consistency in model parameters or outputs. However, as the models themselves are biased, merely constraining their consistency is not sufficient to alleviate prediction bias. In this paper, we explore this bias from a Bayesian perspective and demonstrate that it principally originates from label prior bias within the training data. Building upon this insight, we propose a debiasing method for FSSL named FedDB. FedDB utilizes the Average Prediction Probability of Unlabeled Data (APP-U) to approximate the biased prior.During local training, FedDB employs APP-U to refine pseudo-labeling through Bayes' theorem, thereby significantly reducing the label prior bias. Concurrently, during the model aggregation, FedDB uses APP-U from participating clients to formulate unbiased aggregate weights, thereby effectively diminishing bias in the global model. Experimental results show that FedDB can surpass existing FSSL methods. The code is available at https://github.com/GuogangZhu/FedDB. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: Accepted by IJCAI 2024

arXiv:2405.19723 [pdf, other]

Encoding and Controlling Global Semantics for Long-form Video Question Answering

Authors: Thong Thanh Nguyen, Zhiyuan Hu, Xiaobao Wu, Cong-Duy T Nguyen, See-Kiong Ng, Anh Tuan Luu

Abstract: Seeking answers effectively for long videos is essential to build video question answering (videoQA) systems. Previous methods adaptively select frames and regions from long videos to save computations. However, this fails to reason over the whole sequence of video, leading to sub-optimal performance. To address this problem, we introduce a state space layer (SSL) into multi-modal Transformer to e… ▽ More Seeking answers effectively for long videos is essential to build video question answering (videoQA) systems. Previous methods adaptively select frames and regions from long videos to save computations. However, this fails to reason over the whole sequence of video, leading to sub-optimal performance. To address this problem, we introduce a state space layer (SSL) into multi-modal Transformer to efficiently integrate global semantics of the video, which mitigates the video information loss caused by frame and region selection modules. Our SSL includes a gating unit to enable controllability over the flow of global semantics into visual representations. To further enhance the controllability, we introduce a cross-modal compositional congruence (C^3) objective to encourage global semantics aligned with the question. To rigorously evaluate long-form videoQA capacity, we construct two new benchmarks Ego-QA and MAD-QA featuring videos of considerably long length, i.e. 17.5 minutes and 1.9 hours, respectively. Extensive experiments demonstrate the superiority of our framework on these new as well as existing datasets. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: Work in progress

arXiv:2405.19689 [pdf, other]

Uncertainty-aware sign language video retrieval with probability distribution modeling

Authors: Xuan Wu, Hongxiang Li, Yuanjiang Luo, Xuxin Cheng, Xianwei Zhuang, Meng Cao, Keren Fu

Abstract: Sign language video retrieval plays a key role in facilitating information access for the deaf community. Despite significant advances in video-text retrieval, the complexity and inherent uncertainty of sign language preclude the direct application of these techniques. Previous methods achieve the map** between sign language video and text through fine-grained modal alignment. However, due to th… ▽ More Sign language video retrieval plays a key role in facilitating information access for the deaf community. Despite significant advances in video-text retrieval, the complexity and inherent uncertainty of sign language preclude the direct application of these techniques. Previous methods achieve the map** between sign language video and text through fine-grained modal alignment. However, due to the scarcity of fine-grained annotation, the uncertainty inherent in sign language video is underestimated, limiting the further development of sign language retrieval tasks. To address this challenge, we propose a novel Uncertainty-aware Probability Distribution Retrieval (UPRet), that conceptualizes the map** process of sign language video and text in terms of probability distributions, explores their potential interrelationships, and enables flexible map**s. Experiments on three benchmarks demonstrate the effectiveness of our method, which achieves state-of-the-art results on How2Sign (59.1%), PHOENIX-2014T (72.0%), and CSL-Daily (78.4%). △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.19419 [pdf, other]

Supernova Electron-Neutrino Interactions with Xenon in the nEXO Detector

Authors: nEXO Collaboration, S. Hedges, S. Al Kharusi, E. Angelico, J. P. Brodsky, G. Richardson, S. Wilde, A. Amy, A. Anker, I. J. Arnquist, P. Arsenault, A. Atencio, I. Badhrees, J. Bane, V. Belov, E. P. Bernard, T. Bhatta, A. Bolotnikov, J. Breslin, P. A. Breur, E. Brown, T. Brunner, E. Caden, G. F. Cao, L. Q. Cao , et al. (121 additional authors not shown)

Abstract: Electron-neutrino charged-current interactions with xenon nuclei were modeled in the nEXO neutrinoless double-beta decay detector (~5-tonne, 90% ${}^{136}$Xe, 10% ${}^{134}$Xe) to evaluate its sensitivity to supernova neutrinos. Predictions for event rates and detectable signatures were modeled using the MARLEY event generator. We find good agreement between MARLEY's predictions and existing theor… ▽ More Electron-neutrino charged-current interactions with xenon nuclei were modeled in the nEXO neutrinoless double-beta decay detector (~5-tonne, 90% ${}^{136}$Xe, 10% ${}^{134}$Xe) to evaluate its sensitivity to supernova neutrinos. Predictions for event rates and detectable signatures were modeled using the MARLEY event generator. We find good agreement between MARLEY's predictions and existing theoretical calculations of the inclusive cross sections at supernova neutrino energies. The interactions modeled by MARLEY were simulated within the nEXO simulation framework and were run through an example reconstruction algorithm to determine the detector's efficiency for reconstructing these events. The simulated data, incorporating the detector response, were used to study the ability of nEXO to reconstruct the incident electron-neutrino spectrum and these results were extended to a larger xenon detector of the same isotope enrichment. We estimate that nEXO will be able to observe electron-neutrino interactions with xenon from supernovae as far as 5 to 8 kpc from earth, while the ability to reconstruct incident electron-neutrino spectrum parameters from observed interactions in nEXO is limited to closer supernovae. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: 17 pages, 16 figures

Report number: LLNL-JRNL-864783-DRAFT

arXiv:2405.19291 [pdf, other]

Grasp as You Say: Language-guided Dexterous Grasp Generation

Authors: Yi-Lin Wei, Jian-Jian Jiang, Chengyi Xing, Xiantuo Tan, Xiao-Ming Wu, Hao Li, Mark Cutkosky, Wei-Shi Zheng

Abstract: This paper explores a novel task ""Dexterous Grasp as You Say"" (DexGYS), enabling robots to perform dexterous gras** based on human commands expressed in natural language. However, the development of this field is hindered by the lack of datasets with natural human guidance; thus, we propose a language-guided dexterous grasp dataset, named DexGYSNet, offering high-quality dexterous grasp annota… ▽ More This paper explores a novel task ""Dexterous Grasp as You Say"" (DexGYS), enabling robots to perform dexterous gras** based on human commands expressed in natural language. However, the development of this field is hindered by the lack of datasets with natural human guidance; thus, we propose a language-guided dexterous grasp dataset, named DexGYSNet, offering high-quality dexterous grasp annotations along with flexible and fine-grained human language guidance. Our dataset construction is cost-efficient, with the carefully-design hand-object interaction retargeting strategy, and the LLM-assisted language guidance annotation system. Equipped with this dataset, we introduce the DexGYSGrasp framework for generating dexterous grasps based on human language instructions, with the capability of producing grasps that are intent-aligned, high quality and diversity. To achieve this capability, our framework decomposes the complex learning process into two manageable progressive objectives and introduce two components to realize them. The first component learns the grasp distribution focusing on intention alignment and generation diversity. And the second component refines the grasp quality while maintaining intention consistency. Extensive experiments are conducted on DexGYSNet and real world environment for validation. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: 9 pages, 7 figures

arXiv:2405.19119 [pdf, other]

Can Graph Learning Improve Task Planning?

Authors: Xixi Wu, Yifei Shen, Caihua Shan, Kaitao Song, Siwei Wang, Bohang Zhang, Jiarui Feng, Hong Cheng, Wei Chen, Yun Xiong, Dongsheng Li

Abstract: Task planning is emerging as an important research topic alongside the development of large language models (LLMs). It aims to break down complex user requests into solvable sub-tasks, thereby fulfilling the original requests. In this context, the sub-tasks can be naturally viewed as a graph, where the nodes represent the sub-tasks, and the edges denote the dependencies among them. Consequently, t… ▽ More Task planning is emerging as an important research topic alongside the development of large language models (LLMs). It aims to break down complex user requests into solvable sub-tasks, thereby fulfilling the original requests. In this context, the sub-tasks can be naturally viewed as a graph, where the nodes represent the sub-tasks, and the edges denote the dependencies among them. Consequently, task planning is a decision-making problem that involves selecting a connected path or subgraph within the corresponding graph and invoking it. In this paper, we explore graph learning-based methods for task planning, a direction that is orthogonal to the prevalent focus on prompt design. Our interest in graph learning stems from a theoretical discovery: the biases of attention and auto-regressive loss impede LLMs' ability to effectively navigate decision-making on graphs, which is adeptly addressed by graph neural networks (GNNs). This theoretical insight led us to integrate GNNs with LLMs to enhance overall performance. Extensive experiments demonstrate that GNN-based methods surpass existing solutions even without training, and minimal training can further enhance their performance. Additionally, our approach complements prompt engineering and fine-tuning techniques, with performance further enhanced by improved prompts or a fine-tuned model. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.18897 [pdf, other]

MLAE: Masked LoRA Experts for Parameter-Efficient Fine-Tuning

Authors: Junjie Wang, Guang**g Yang, Wentao Chen, Huahui Yi, Xiaohu Wu, Qicheng Lao

Abstract: In response to the challenges posed by the extensive parameter updates required for full fine-tuning of large-scale pre-trained models, parameter-efficient fine-tuning (PEFT) methods, exemplified by Low-Rank Adaptation (LoRA), have emerged. LoRA simplifies the fine-tuning process but may still struggle with a certain level of redundancy in low-rank matrices and limited effectiveness from merely in… ▽ More In response to the challenges posed by the extensive parameter updates required for full fine-tuning of large-scale pre-trained models, parameter-efficient fine-tuning (PEFT) methods, exemplified by Low-Rank Adaptation (LoRA), have emerged. LoRA simplifies the fine-tuning process but may still struggle with a certain level of redundancy in low-rank matrices and limited effectiveness from merely increasing their rank. To address these issues, a natural idea is to enhance the independence and diversity of the learning process for the low-rank matrices. Therefore, we propose Masked LoRA Experts (MLAE), an innovative approach that applies the concept of masking to PEFT. Our method incorporates a cellular decomposition strategy that transforms a low-rank matrix into independent rank-1 submatrices, or ``experts'', thus enhancing independence. Additionally, we introduce a binary mask matrix that selectively activates these experts during training to promote more diverse and anisotropic learning, based on expert-level dropout strategies. Our investigations reveal that this selective activation not only enhances performance but also fosters a more diverse acquisition of knowledge with a marked decrease in parameter similarity among MLAE, significantly boosting the quality of the model while barely increasing the parameter count. Remarkably, MLAE achieves new SOTA performance with an average accuracy score of 78.8% on the VTAB-1k benchmark and 90.9% on the FGVC benchmark, demonstrating superior performance. Our code is available at https://github.com/jie040109/MLAE. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: Tech report

arXiv:2405.18781 [pdf, other]

On the Role of Attention Masks and LayerNorm in Transformers

Authors: Xinyi Wu, Amir Ajorlou, Yifei Wang, Stefanie Jegelka, Ali Jadbabaie

Abstract: Self-attention is the key mechanism of transformers, which are the essential building blocks of modern foundation models. Recent studies have shown that pure self-attention suffers from an increasing degree of rank collapse as depth increases, limiting model expressivity and further utilization of model depth. The existing literature on rank collapse, however, has mostly overlooked other critical… ▽ More Self-attention is the key mechanism of transformers, which are the essential building blocks of modern foundation models. Recent studies have shown that pure self-attention suffers from an increasing degree of rank collapse as depth increases, limiting model expressivity and further utilization of model depth. The existing literature on rank collapse, however, has mostly overlooked other critical components in transformers that may alleviate the rank collapse issue. In this paper, we provide a general analysis of rank collapse under self-attention, taking into account the effects of attention masks and layer normalization (LayerNorm). In particular, we find that although pure masked attention still suffers from exponential collapse to a rank one subspace, local masked attention can provably slow down the collapse rate. In the case of self-attention with LayerNorm, we first show that for certain classes of value matrices, collapse to a rank one subspace still happens exponentially. However, through construction of nontrivial counterexamples, we then establish that with proper choice of value matrices, a general class of sequences may not converge to a rank one subspace, and the self-attention dynamics with LayerNorm can simultaneously possess a rich set of equilibria with any possible rank between one and full. Our result refutes the previous hypothesis that LayerNorm plays no role in the rank collapse of self-attention and suggests that self-attention with LayerNorm constitutes a much more expressive, versatile nonlinear dynamical system than what was originally thought. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.18481 [pdf, ps, other]

Pristine and Pseudo-gapped Boundaries of the Deconfined Quantum Critical Points

Authors: Nayan Myerson-Jain, Xiao-Chuan Wu, Cenke Xu

Abstract: Bulk topology and criticality can both lead to nontrivial boundary effects. Topological orders are often characterized by their robust edge states, while bulk critical points can have different boundary scalings governed by boundary conditions. The interplay between these two different boundary effects is an intriguing problem. The boundary of the deconfined quantum critical point (DQCP) is the id… ▽ More Bulk topology and criticality can both lead to nontrivial boundary effects. Topological orders are often characterized by their robust edge states, while bulk critical points can have different boundary scalings governed by boundary conditions. The interplay between these two different boundary effects is an intriguing problem. The boundary of the deconfined quantum critical point (DQCP) is the ideal platform for the interplay of the two boundary effects, as the DQCP is also an intrinsically gapless symmetry protected topological (igSPT) state. In this work we discuss the boundary of several analogues of the DQCP. We demonstrate that the fluctuation of the bulk order parameters and their various boundary conditions lead to a rich possibility of the edge states, including a "pseudogap" (or super power-law decay) behavior. We also discuss the quantum information perspective of our work, i.e. the implication of our results on DQCP under weak-measurement. Weak-measurement followed by post-selection can change the boundary condition at the temporal boundary in the path-integral representation of a density matrix, which will lead to different behaviors of the "strange correlator". △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: 9 pages

arXiv:2405.18194 [pdf, other]

Delving into Differentially Private Transformer

Authors: Youlong Ding, Xueyang Wu, Yining Meng, Yonggang Luo, Hao Wang, Weike Pan

Abstract: Deep learning with differential privacy (DP) has garnered significant attention over the past years, leading to the development of numerous methods aimed at enhancing model accuracy and training efficiency. This paper delves into the problem of training Transformer models with differential privacy. Our treatment is modular: the logic is to `reduce' the problem of training DP Transformer to the mor… ▽ More Deep learning with differential privacy (DP) has garnered significant attention over the past years, leading to the development of numerous methods aimed at enhancing model accuracy and training efficiency. This paper delves into the problem of training Transformer models with differential privacy. Our treatment is modular: the logic is to `reduce' the problem of training DP Transformer to the more basic problem of training DP vanilla neural nets. The latter is better understood and amenable to many model-agnostic methods. Such `reduction' is done by first identifying the hardness unique to DP Transformer training: the attention distraction phenomenon and a lack of compatibility with existing techniques for efficient gradient clip**. To deal with these two issues, we propose the Re-Attention Mechanism and Phantom Clip**, respectively. We believe that our work not only casts new light on training DP Transformers but also promotes a modular treatment to advance research in the field of differentially private deep learning. △ Less

Submitted 29 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

Comments: ICML 2024

arXiv:2405.18124 [pdf, other]

Dual-Path Multi-Scale Transformer for High-Quality Image Deraining

Authors: Huiling Zhou, Xianhao Wu, Hongming Chen

Abstract: Despite the superiority of convolutional neural networks (CNNs) and Transformers in single-image rain removal, current multi-scale models still face significant challenges due to their reliance on single-scale feature pyramid patterns. In this paper, we propose an effective rain removal method, the dual-path multi-scale Transformer (DPMformer) for high-quality image reconstruction by leveraging ri… ▽ More Despite the superiority of convolutional neural networks (CNNs) and Transformers in single-image rain removal, current multi-scale models still face significant challenges due to their reliance on single-scale feature pyramid patterns. In this paper, we propose an effective rain removal method, the dual-path multi-scale Transformer (DPMformer) for high-quality image reconstruction by leveraging rich multi-scale information. This method consists of a backbone path and two branch paths from two different multi-scale approaches. Specifically, one path adopts the coarse-to-fine strategy, progressively downsampling the image to 1/2 and 1/4 scales, which helps capture fine-scale potential rain information fusion. Simultaneously, we employ the multi-patch stacked model (non-overlap** blocks of size 2 and 4) to enrich the feature information of the deep network in the other path. To learn a richer blend of features, the backbone path fully utilizes the multi-scale information to achieve high-quality rain removal image reconstruction. Extensive experiments on benchmark datasets demonstrate that our method achieves promising performance compared to other state-of-the-art methods. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.17978 [pdf, other]

FASTopic: A Fast, Adaptive, Stable, and Transferable Topic Modeling Paradigm

Authors: Xiaobao Wu, Thong Nguyen, Delvin Ce Zhang, William Yang Wang, Anh Tuan Luu

Abstract: Topic models have been evolving rapidly over the years, from conventional to recent neural models. However, existing topic models generally struggle with either effectiveness, efficiency, or stability, highly impeding their practical applications. In this paper, we propose FASTopic, a fast, adaptive, stable, and transferable topic model. FASTopic follows a new paradigm: Dual Semantic-relation Reco… ▽ More Topic models have been evolving rapidly over the years, from conventional to recent neural models. However, existing topic models generally struggle with either effectiveness, efficiency, or stability, highly impeding their practical applications. In this paper, we propose FASTopic, a fast, adaptive, stable, and transferable topic model. FASTopic follows a new paradigm: Dual Semantic-relation Reconstruction (DSR). Instead of previous conventional, neural VAE-based or clustering-based methods, DSR discovers latent topics by reconstruction through modeling the semantic relations among document, topic, and word embeddings. This brings about a neat and efficient topic modeling framework. We further propose a novel Embedding Transport Plan (ETP) method. Rather than early straightforward approaches, ETP explicitly regularizes the semantic relations as optimal transport plans. This addresses the relation bias issue and thus leads to effective topic modeling. Extensive experiments on benchmark datasets demonstrate that our FASTopic shows superior effectiveness, efficiency, adaptivity, stability, and transferability, compared to state-of-the-art baselines across various scenarios. Our code is available at https://github.com/bobxwu/FASTopic . △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.17957 [pdf, other]

Modeling Dynamic Topics in Chain-Free Fashion by Evolution-Tracking Contrastive Learning and Unassociated Word Exclusion

Authors: Xiaobao Wu, Xinshuai Dong, Liangming Pan, Thong Nguyen, Anh Tuan Luu

Abstract: Dynamic topic models track the evolution of topics in sequential documents, which have derived various applications like trend analysis and opinion mining. However, existing models suffer from repetitive topic and unassociated topic issues, failing to reveal the evolution and hindering further applications. To address these issues, we break the tradition of simply chaining topics in existing work… ▽ More Dynamic topic models track the evolution of topics in sequential documents, which have derived various applications like trend analysis and opinion mining. However, existing models suffer from repetitive topic and unassociated topic issues, failing to reveal the evolution and hindering further applications. To address these issues, we break the tradition of simply chaining topics in existing work and propose a novel neural \modelfullname. We introduce a new evolution-tracking contrastive learning method that builds the similarity relations among dynamic topics. This not only tracks topic evolution but also maintains topic diversity, mitigating the repetitive topic issue. To avoid unassociated topics, we further present an unassociated word exclusion method that consistently excludes unassociated words from discovered topics. Extensive experiments demonstrate our model significantly outperforms state-of-the-art baselines, tracking topic evolution with high-quality topics, showing better performance on downstream tasks, and remaining robust to the hyperparameter for evolution intensities. Our code is available at https://github.com/bobxwu/CFDTM . △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: Accepted to ACL 2024 Findings

arXiv:2405.17734 [pdf, other]

Towards Efficient Disaster Response via Cost-effective Unbiased Class Rate Estimation through Neyman Allocation Stratified Sampling Active Learning

Authors: Yanbing Bai, Xinyi Wu, Lai Xu, Jihan Pei, Erick Mas, Shunichi Koshimura

Abstract: With the rapid development of earth observation technology, we have entered an era of massively available satellite remote-sensing data. However, a large amount of satellite remote sensing data lacks a label or the label cost is too high to hinder the potential of AI technology mining satellite data. Especially in such an emergency response scenario that uses satellite data to evaluate the degree… ▽ More With the rapid development of earth observation technology, we have entered an era of massively available satellite remote-sensing data. However, a large amount of satellite remote sensing data lacks a label or the label cost is too high to hinder the potential of AI technology mining satellite data. Especially in such an emergency response scenario that uses satellite data to evaluate the degree of disaster damage. Disaster damage assessment encountered bottlenecks due to excessive focus on the damage of a certain building in a specific geographical space or a certain area on a larger scale. In fact, in the early days of disaster emergency response, government departments were more concerned about the overall damage rate of the disaster area instead of single-building damage, because this helps the government decide the level of emergency response. We present an innovative algorithm that constructs Neyman stratified random sampling trees for binary classification and extends this approach to multiclass problems. Through extensive experimentation on various datasets and model structures, our findings demonstrate that our method surpasses both passive and conventional active learning techniques in terms of class rate estimation and model enhancement with only 30\%-60\% of the annotation cost of simple sampling. It effectively addresses the 'sampling bias' challenge in traditional active learning strategies and mitigates the 'cold start' dilemma. The efficacy of our approach is further substantiated through application to disaster evaluation tasks using Xview2 Satellite imagery, showcasing its practical utility in real-world contexts. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.17024 [pdf]

Beware of Overestimated Decoding Performance Arising from Temporal Autocorrelations in Electroencephalogram Signals

Authors: Xiran Xu, Bo Wang, Boda Xiao, Yadong Niu, Yiwen Wang, Xihong Wu, **g Chen

Abstract: Researchers have reported high decoding accuracy (>95%) using non-invasive Electroencephalogram (EEG) signals for brain-computer interface (BCI) decoding tasks like image decoding, emotion recognition, auditory spatial attention detection, etc. Since these EEG data were usually collected with well-designed paradigms in labs, the reliability and robustness of the corresponding decoding methods were… ▽ More Researchers have reported high decoding accuracy (>95%) using non-invasive Electroencephalogram (EEG) signals for brain-computer interface (BCI) decoding tasks like image decoding, emotion recognition, auditory spatial attention detection, etc. Since these EEG data were usually collected with well-designed paradigms in labs, the reliability and robustness of the corresponding decoding methods were doubted by some researchers, and they argued that such decoding accuracy was overestimated due to the inherent temporal autocorrelation of EEG signals. However, the coupling between the stimulus-driven neural responses and the EEG temporal autocorrelations makes it difficult to confirm whether this overestimation exists in truth. Furthermore, the underlying pitfalls behind overestimated decoding accuracy have not been fully explained due to a lack of appropriate formulation. In this work, we formulate the pitfall in various EEG decoding tasks in a unified framework. EEG data were recorded from watermelons to remove stimulus-driven neural responses. Labels were assigned to continuous EEG according to the experimental design for EEG recording of several typical datasets, and then the decoding methods were conducted. The results showed the label can be successfully decoded as long as continuous EEG data with the same label were split into training and test sets. Further analysis indicated that high accuracy of various BCI decoding tasks could be achieved by associating labels with EEG intrinsic temporal autocorrelation features. These results underscore the importance of choosing the right experimental designs and data splits in BCI decoding tasks to prevent inflated accuracies due to EEG temporal autocorrelation. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.16435 [pdf, other]

Structure-aware Semantic Node Identifiers for Learning on Graphs

Authors: Yuankai Luo, Qijiong Liu, Lei Shi, Xiao-Ming Wu

Abstract: We present a novel graph tokenization framework that generates structure-aware, semantic node identifiers (IDs) in the form of a short sequence of discrete codes, serving as symbolic representations of nodes. We employs vector quantization to compress continuous node embeddings from multiple layers of a graph neural network (GNN), into compact, meaningful codes, under both self-supervised and supe… ▽ More We present a novel graph tokenization framework that generates structure-aware, semantic node identifiers (IDs) in the form of a short sequence of discrete codes, serving as symbolic representations of nodes. We employs vector quantization to compress continuous node embeddings from multiple layers of a graph neural network (GNN), into compact, meaningful codes, under both self-supervised and supervised learning paradigms. The resulting node IDs capture a high-level abstraction of graph data, enhancing the efficiency and interpretability of GNNs. Through extensive experiments on 34 datasets, including node classification, graph classification, link prediction, and attributed graph clustering tasks, we demonstrate that our generated node IDs not only improve computational efficiency but also achieve competitive performance compared to current state-of-the-art methods. △ Less

Submitted 26 May, 2024; originally announced May 2024.

arXiv:2405.15487 [pdf, ps, other]

doi 10.1007/s11433-023-2342-4

Principal components of nuclear mass models

Authors: X. H. Wu, P. W. Zhao

Abstract: The principal component analysis approach is employed to extract the principal components contained in nuclear mass models for the first time. The effects coming from different nuclear mass models are reintegrated and reorganized in the extracted principal components. These extracted principal components are recombined to build new mass models, which achieve better accuracy than the original theor… ▽ More The principal component analysis approach is employed to extract the principal components contained in nuclear mass models for the first time. The effects coming from different nuclear mass models are reintegrated and reorganized in the extracted principal components. These extracted principal components are recombined to build new mass models, which achieve better accuracy than the original theoretical mass models. This indicates that the effects contained in different mass models can work together to improve the nuclear mass predictions with the help of the principal component analysis approach. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: 8 pages, 3 figures, 1 table

Journal ref: Sci. China Phys. Mech. Astron. 67, 272011 (2024)

arXiv:2405.14709 [pdf, other]

OpFlowTalker: Realistic and Natural Talking Face Generation via Optical Flow Guidance

Authors: Shuheng Ge, Haoyu Xing, Li Zhang, Xiangqian Wu

Abstract: Creating realistic, natural, and lip-readable talking face videos remains a formidable challenge. Previous research primarily concentrated on generating and aligning single-frame images while overlooking the smoothness of frame-to-frame transitions and temporal dependencies. This often compromised visual quality and effects in practical settings, particularly when handling complex facial data and… ▽ More Creating realistic, natural, and lip-readable talking face videos remains a formidable challenge. Previous research primarily concentrated on generating and aligning single-frame images while overlooking the smoothness of frame-to-frame transitions and temporal dependencies. This often compromised visual quality and effects in practical settings, particularly when handling complex facial data and audio content, which frequently led to semantically incongruent visual illusions. Specifically, synthesized videos commonly featured disorganized lip movements, making them difficult to understand and recognize. To overcome these limitations, this paper introduces the application of optical flow to guide facial image generation, enhancing inter-frame continuity and semantic consistency. We propose "OpFlowTalker", a novel approach that utilizes predicted optical flow changes from audio inputs rather than direct image predictions. This method smooths image transitions and aligns changes with semantic content. Moreover, it employs a sequence fusion technique to replace the independent generation of single frames, thus preserving contextual information and maintaining temporal coherence. We also developed an optical flow synchronization module that regulates both full-face and lip movements, optimizing visual synthesis by balancing regional dynamics. Furthermore, we introduce a Visual Text Consistency Score (VTCS) that accurately measures lip-readability in synthesized videos. Extensive empirical evidence validates the effectiveness of our approach. △ Less

Submitted 28 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.13578 [pdf, other]

ConTrans: Weak-to-Strong Alignment Engineering via Concept Transplantation

Authors: Weilong Dong, Xinwei Wu, Renren **, Shaoyang Xu, Deyi Xiong

Abstract: Ensuring large language models (LLM) behave consistently with human goals, values, and intentions is crucial for their safety but yet computationally expensive. To reduce the computational cost of alignment training of LLMs, especially for those with a huge number of parameters, and to reutilize learned value alignment, we propose ConTrans, a novel framework that enables weak-to-strong alignment t… ▽ More Ensuring large language models (LLM) behave consistently with human goals, values, and intentions is crucial for their safety but yet computationally expensive. To reduce the computational cost of alignment training of LLMs, especially for those with a huge number of parameters, and to reutilize learned value alignment, we propose ConTrans, a novel framework that enables weak-to-strong alignment transfer via concept transplantation. From the perspective of representation engineering, ConTrans refines concept vectors in value alignment from a source LLM (usually a weak yet aligned LLM). The refined concept vectors are then reformulated to adapt to the target LLM (usually a strong yet unaligned base LLM) via affine transformation. In the third step, ConTrans transplants the reformulated concept vectors into the residual stream of the target LLM. Experiments demonstrate the successful transplantation of a wide range of aligned concepts from 7B models to 13B and 70B models across multiple LLMs and LLM families. Remarkably, ConTrans even surpasses instruction-tuned models in terms of truthfulness. Experiment results validate the effectiveness of both inter-LLM-family and intra-LLM-family concept transplantation. Our work successfully demonstrates an alternative way to achieve weak-to-strong alignment generalization and control. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.13315 [pdf, other]

Study of the decays $χ_{cJ}\toΛ\barΛω$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (638 additional authors not shown)

Abstract: Using $(27.12\pm 0.14)\times10^{8}$ $ψ(3686)$ events collected with the BESIII detector, we present the first observation of the decays $χ_{cJ}\toΛ\barΛω$, where $J=0, 1, 2$, with statistical significances of $11.7 σ, 11.2 σ$, and $11.8 σ$. The branching fractions of these decays are determined to be $\mathcal{B}(χ_{c0}\toΛ\barΛω)=({2.37 \pm 0.22 \pm 0.23}) \times 10^{-4}$,… ▽ More Using $(27.12\pm 0.14)\times10^{8}$ $ψ(3686)$ events collected with the BESIII detector, we present the first observation of the decays $χ_{cJ}\toΛ\barΛω$, where $J=0, 1, 2$, with statistical significances of $11.7 σ, 11.2 σ$, and $11.8 σ$. The branching fractions of these decays are determined to be $\mathcal{B}(χ_{c0}\toΛ\barΛω)=({2.37 \pm 0.22 \pm 0.23}) \times 10^{-4}$, $\mathcal{B}(χ_{c1}\toΛ\barΛω)=({1.01 \pm 0.10 \pm 0.11}) \times 10^{-4}$, and $\mathcal{B}(χ_{c2}\toΛ\barΛω)=({1.40 \pm 0.13 \pm 0.17}) \times 10^{-4}$, where the first uncertainties are statistical and the second are systematic. We observe no clear intermediate structures. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: 11 pages, 10 figures

arXiv:2405.12809 [pdf, other]

Precision measurement of the branching fraction of \boldmath $J/ψ\rightarrow K^+K^-$ via $ψ(2S)\rightarrow π^+π^-J/ψ$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (604 additional authors not shown)

Abstract: Using a sample of $448.1 \times 10^6$ $ψ(2S)$ events collected with the BESIII detector, we perform a study of the decay $J/ψ\rightarrow K^+K^-$ via $ψ(2S)\rightarrow π^+π^-J/ψ$. The branching fraction of $J/ψ\rightarrow K^+K^-$ is determined to be $\mathcal{B}_{K^+K^-}=(3.072\pm 0.023({\rm stat.})\pm 0.050({\rm syst.}))\times 10^{-4}$, which is consistent with previous measurements but with sig… ▽ More Using a sample of $448.1 \times 10^6$ $ψ(2S)$ events collected with the BESIII detector, we perform a study of the decay $J/ψ\rightarrow K^+K^-$ via $ψ(2S)\rightarrow π^+π^-J/ψ$. The branching fraction of $J/ψ\rightarrow K^+K^-$ is determined to be $\mathcal{B}_{K^+K^-}=(3.072\pm 0.023({\rm stat.})\pm 0.050({\rm syst.}))\times 10^{-4}$, which is consistent with previous measurements but with significantly improved precision. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: to be submitted to PRD

arXiv:2405.12592 [pdf]

Spin-polarized p-wave superconductivity in the kagome material RbV$_3$Sb$_5$

Authors: Shuo Wang, Xilin Feng, **g-Zhi Fang, Jia-Peng Peng, Zi-Ting Sun, Jia-Jie Yang, **gchao Liu, Jia-Ji Zhao, Jian-Kun Wang, Xin-Jie Liu, Ze-Nan Wu, Shengbiao Sun, Ning Kang, Xiao-Song Wu, Zhensheng Zhang, Xuewen Fu, Kam Tuen Law, Ben-Chuan Lin, Dapeng Yu

Abstract: The study of kagome materials has attracted much attention in the past few years due to the presence of many electron-electron interaction-driven phases in a single material.In this work, we report the discovery of intrinsic spin-polarized p-wave superconductivity in the thin-flake kagome material RbV$_3$Sb$_5$. Firstly, when an in-plane magnetic field is swept in opposite directions, we observe a… ▽ More The study of kagome materials has attracted much attention in the past few years due to the presence of many electron-electron interaction-driven phases in a single material.In this work, we report the discovery of intrinsic spin-polarized p-wave superconductivity in the thin-flake kagome material RbV$_3$Sb$_5$. Firstly, when an in-plane magnetic field is swept in opposite directions, we observe a unique form of hysteresis in magnetoresistance which is different from the hysteresis induced by extrinsic mechanisms such as flux-trap** or superheating and supercooling effects. The unconventional hysteresis indicates the emergence of an intrinsic time-reversal symmetry-breaking superconducting phase. Strikingly, at a fixed magnetic field, the finite-resistance state can be quenched to the zero-resistance state by applying a large current. Secondly, at temperatures around 400 mK, the re-entrance of superconductivity occurs during an in-plane field-swee** process with a fixed swee** direction. This kind of re-entrance is asymmetric about the zero field axis and observed in all field directions for a fixed current direction, which is different from the re-entrance observed in conventional superconductors. Moreover, the angle-dependent in-plane critical field measurements reveal a two-fold symmetry that deviates from the original, centrosymmetric D$_{6h}$ point group symmetry of the crystal. These findings put very strong constraints on the possible superconducting pairing symmetry of RbV$_3$Sb$_5$. We point out that the pairing symmetry, which is consistent with the crystal symmetry and all the observed novel properties, is a time-reversal symmetry-breaking, p-wave pairing with net spin polarization. Importantly, this p-wave pairing gives rise to a nodal topological superconducting state with Majorana flat bands on the sample edges. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: 21 pages, 4 figures

arXiv:2405.11921 [pdf, other]

MirrorGaussian: Reflecting 3D Gaussians for Reconstructing Mirror Reflections

Authors: Jiayue Liu, Xiao Tang, Freeman Cheng, Roy Yang, Zhihao Li, Jianzhuang Liu, Yi Huang, Jiaqi Lin, Shiyong Liu, Xiaofei Wu, Songcen Xu, Chun Yuan

Abstract: 3D Gaussian Splatting showcases notable advancements in photo-realistic and real-time novel view synthesis. However, it faces challenges in modeling mirror reflections, which exhibit substantial appearance variations from different viewpoints. To tackle this problem, we present MirrorGaussian, the first method for mirror scene reconstruction with real-time rendering based on 3D Gaussian Splatting.… ▽ More 3D Gaussian Splatting showcases notable advancements in photo-realistic and real-time novel view synthesis. However, it faces challenges in modeling mirror reflections, which exhibit substantial appearance variations from different viewpoints. To tackle this problem, we present MirrorGaussian, the first method for mirror scene reconstruction with real-time rendering based on 3D Gaussian Splatting. The key insight is grounded on the mirror symmetry between the real-world space and the virtual mirror space. We introduce an intuitive dual-rendering strategy that enables differentiable rasterization of both the real-world 3D Gaussians and the mirrored counterpart obtained by reflecting the former about the mirror plane. All 3D Gaussians are jointly optimized with the mirror plane in an end-to-end framework. MirrorGaussian achieves high-quality and real-time rendering in scenes with mirrors, empowering scene editing like adding new mirrors and objects. Comprehensive experiments on multiple datasets demonstrate that our approach significantly outperforms existing methods, achieving state-of-the-art results. Project page: https://mirror-gaussian.github.io/. △ Less

Submitted 20 May, 2024; originally announced May 2024.

arXiv:2405.11826 [pdf, other]

Data quality control system and long-term performance monitor of the LHAASO-KM2A

Authors: Zhen Cao, F. Aharonian, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, W. Bian, A. V. Bukevich, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, H. X. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. Chen , et al. (263 additional authors not shown)

Abstract: The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To… ▽ More The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To ensure the reliability of the LHAASO-KM2A data, a three-level quality control system has been established. It is used to monitor the status of detector units, stability of reconstructed parameters and the performance of the array based on observations of the Crab Nebula and Moon shadow. This paper will introduce the control system and its application on the LHAASO-KM2A data collected from August 2021 to July 2023. During this period, the pointing and angular resolution of the array were stable. From the observations of the Moon shadow and Crab Nebula, the results achieved using the two methods are consistent with each other. According to the observation of the Crab Nebula at energies from 25 TeV to 100 TeV, the time averaged pointing errors are estimated to be $-0.003^{\circ} \pm 0.005^{\circ}$ and $0.001^{\circ} \pm 0.006^{\circ}$ in the R.A. and Dec directions, respectively. △ Less

Submitted 13 June, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

Comments: 15 pages, 9 figures

arXiv:2405.11750 [pdf, other]

The Intermediate-Mass Black Hole Reverberation Map** Project: Initial Results for a candidate IMBH in a nearby Seyfert 1 Galaxy

Authors: Wenwen Zuo, Hengxiao Guo, **gbo Sun, Qi Yuan, Paulina Lira, Minfeng Gu, Philip G. Edwards, Alok C. Gupta, Shubham Kishore, Jamie Stevens, Tao An, Zhen-Yi Cai, Haicheng Feng, Luis C. Ho, Dragana Ilić, Andjelka B. Kovačević, ShaSha Li, Mar Mezcua, Luka Č. Popović, Mouyuan Sun, Tushar Tripathi, Vivian U., Oliver Vince, Jianguo Wang, Junxian Wang , et al. (3 additional authors not shown)

Abstract: To investigate the short-term variability and determine the size of the optical continuum emitting size of intermediate-mass black holes (IMBHs), we carried out high-cadence, multi-band photometric monitoring of a Seyfert 1 galaxy J0249-0815 across two nights, together with a one-night single-band preliminary test. The presence of the broad Ha component in our target was confirmed by recent Paloma… ▽ More To investigate the short-term variability and determine the size of the optical continuum emitting size of intermediate-mass black holes (IMBHs), we carried out high-cadence, multi-band photometric monitoring of a Seyfert 1 galaxy J0249-0815 across two nights, together with a one-night single-band preliminary test. The presence of the broad Ha component in our target was confirmed by recent Palomar/P200 spectroscopic observations, 23 years after Sloan Digital Sky Survey, ruling out the supernovae origin of the broad Ha line. The photometric experiment was primarily conducted utilizing four-channel imagers MuSCAT 3 & 4 mounted on 2-meter telescopes within the Las Cumbres Observatory Global Telescope Network. Despite the expectation of variability, we observed no significant variation (<1.4%) on timescales of 6-10 hours. This non-detection is likely due to substantial host galaxy light diluting the subtle AGN variability. Dual-band preliminary tests and tailored simulations may enhance the possibility of detecting variability and lag in future IMBH reverberation campaigns. △ Less

Submitted 19 May, 2024; originally announced May 2024.

Comments: 14 pages, 6 figures, submitted to ApJ, comments welcome

arXiv:2405.11585 [pdf, other]

Improved measurement of the branching fraction of $h_{c}\rightarrowγη^\prime/η$ and search for $h_{c}\rightarrowγπ^0$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (645 additional authors not shown)

Abstract: The processes $h_c\rightarrowγP(P = η^\prime,~η,~π^{0}))$ are studied with a sample of $(27.12\pm0.14)\times10^{8}$ $ψ(3686)$ events collected by the BESIII detector at the BEPCII collider. The branching fractions of $h_c\rightarrowγη^\prime$ and $h_c\rightarrowγη$ are measured to be $(1.40\pm0.11\pm0.04\pm0.10)\times10^{-3}$ and $(3.77\pm0.55\pm0.13\pm0.26)\times10^{-4}$, respectively, where the… ▽ More The processes $h_c\rightarrowγP(P = η^\prime,~η,~π^{0}))$ are studied with a sample of $(27.12\pm0.14)\times10^{8}$ $ψ(3686)$ events collected by the BESIII detector at the BEPCII collider. The branching fractions of $h_c\rightarrowγη^\prime$ and $h_c\rightarrowγη$ are measured to be $(1.40\pm0.11\pm0.04\pm0.10)\times10^{-3}$ and $(3.77\pm0.55\pm0.13\pm0.26)\times10^{-4}$, respectively, where the first uncertainties are statistical, the second systematic, and the third from the branching fraction of $ψ(3686)\rightarrowπ^{0}h_c$. The ratio $R_{h_c}=\frac{\mathscr{B}(h_c\rightarrowγη)}{\mathscr{B}(h_c\rightarrowγη^\prime)}$ is calculated to be $(27.0\pm4.4\pm1.0)\%$. The measurements are consistent with the previous results with improved precision by a factor of 2. The results are valuable for gaining a deeper understanding of $η-η^\prime$ mixing, and its manifestation within quantum chromodynamics. No significant signal is found for the decay $h_c\rightarrowγπ^{0}$, and an upper limit is placed on its branching fraction of $\mathscr{B}(h_c\rightarrowγπ^{0})<5.0\times10^{-5}$, at the 90\% confidence level. △ Less

Submitted 19 May, 2024; originally announced May 2024.

arXiv:2405.11349 [pdf, other]

Unlock the Power of Algorithm Features: A Generalization Analysis for Algorithm Selection

Authors: Xingyu Wu, Yan Zhong, Jibin Wu, Yuxiao Huang, Sheng-hao Wu, Kay Chen Tan

Abstract: In the algorithm selection research, the discussion surrounding algorithm features has been significantly overshadowed by the emphasis on problem features. Although a few empirical studies have yielded evidence regarding the effectiveness of algorithm features, the potential benefits of incorporating algorithm features into algorithm selection models and their suitability for different scenarios r… ▽ More In the algorithm selection research, the discussion surrounding algorithm features has been significantly overshadowed by the emphasis on problem features. Although a few empirical studies have yielded evidence regarding the effectiveness of algorithm features, the potential benefits of incorporating algorithm features into algorithm selection models and their suitability for different scenarios remain unclear. In this paper, we address this gap by proposing the first provable guarantee for algorithm selection based on algorithm features, taking a generalization perspective. We analyze the benefits and costs associated with algorithm features and investigate how the generalization error is affected by different factors. Specifically, we examine adaptive and predefined algorithm features under transductive and inductive learning paradigms, respectively, and derive upper bounds for the generalization error based on their model's Rademacher complexity. Our theoretical findings not only provide tight upper bounds, but also offer analytical insights into the impact of various factors, such as the training scale of problem instances and candidate algorithms, model parameters, feature values, and distributional differences between the training and test data. Notably, we demonstrate how models will benefit from algorithm features in complex scenarios involving many algorithms, and proves the positive correlation between generalization error bound and $χ^2$-divergence of distributions. △ Less

Submitted 3 June, 2024; v1 submitted 18 May, 2024; originally announced May 2024.

arXiv:2405.11143 [pdf, other]

OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework

Authors: Jian Hu, Xibin Wu, Weixun Wang, Xianyu, Dehao Zhang, Yu Cao

Abstract: As large language models (LLMs) continue to grow by scaling laws, reinforcement learning from human feedback (RLHF) has gained significant attention due to its outstanding performance. However, unlike pretraining or fine-tuning a single model, scaling reinforcement learning from human feedback (RLHF) for training large language models poses coordination challenges across four models. We present Op… ▽ More As large language models (LLMs) continue to grow by scaling laws, reinforcement learning from human feedback (RLHF) has gained significant attention due to its outstanding performance. However, unlike pretraining or fine-tuning a single model, scaling reinforcement learning from human feedback (RLHF) for training large language models poses coordination challenges across four models. We present OpenRLHF, an open-source framework enabling efficient RLHF scaling. Unlike existing RLHF frameworks that co-locate four models on the same GPUs, OpenRLHF re-designs scheduling for the models beyond 70B parameters using Ray, vLLM, and DeepSpeed, leveraging improved resource utilization and diverse training approaches. Integrating seamlessly with Hugging Face, OpenRLHF provides an out-of-the-box solution with optimized algorithms and launch scripts, which ensures user-friendliness. OpenRLHF implements RLHF, DPO, rejection sampling, and other alignment techniques. Empowering state-of-the-art LLM development, OpenRLHF's code is available at https://github.com/OpenLLMAI/OpenRLHF. △ Less

Submitted 3 June, 2024; v1 submitted 19 May, 2024; originally announced May 2024.

arXiv:2405.10030 [pdf, other]

RSDehamba: Lightweight Vision Mamba for Remote Sensing Satellite Image Dehazing

Authors: Huiling Zhou, Xianhao Wu, Hongming Chen, Xiang Chen, Xin He

Abstract: Remote sensing image dehazing (RSID) aims to remove nonuniform and physically irregular haze factors for high-quality image restoration. The emergence of CNNs and Transformers has taken extraordinary strides in the RSID arena. However, these methods often struggle to demonstrate the balance of adequate long-range dependency modeling and maintaining computational efficiency. To this end, we propose… ▽ More Remote sensing image dehazing (RSID) aims to remove nonuniform and physically irregular haze factors for high-quality image restoration. The emergence of CNNs and Transformers has taken extraordinary strides in the RSID arena. However, these methods often struggle to demonstrate the balance of adequate long-range dependency modeling and maintaining computational efficiency. To this end, we propose the first lightweight network on the mamba-based model called RSDhamba in the field of RSID. Greatly inspired by the recent rise of Selective State Space Model (SSM) for its superior performance in modeling linear complexity and remote dependencies, our designed RSDehamba integrates the SSM framework into the U-Net architecture. Specifically, we propose the Vision Dehamba Block (VDB) as the core component of the overall network, which utilizes the linear complexity of SSM to achieve the capability of global context encoding. Simultaneously, the Direction-aware Scan Module (DSM) is designed to dynamically aggregate feature exchanges over different directional domains to effectively enhance the flexibility of sensing the spatially varying distribution of haze. In this way, our RSDhamba fully demonstrates the superiority of spatial distance capture dependencies and channel information exchange for better extraction of haze features. Extensive experimental results on widely used benchmarks validate the surpassing performance of our RSDehamba against existing state-of-the-art methods. △ Less

Submitted 16 May, 2024; originally announced May 2024.

arXiv:2405.09780 [pdf, other]

EFEAR-4D: Ego-Velocity Filtering for Efficient and Accurate 4D radar Odometry

Authors: Xiaoyi Wu, Yushuai Chen, Zhan Li, Ziyang Hong, Liang Hu

Abstract: Odometry is a crucial component for successfully implementing autonomous navigation, relying on sensors such as cameras, LiDARs and IMUs. However, these sensors may encounter challenges in extreme weather conditions, such as snowfall and fog. The emergence of FMCW radar technology offers the potential for robust perception in adverse conditions. As the latest generation of FWCW radars, the 4D mmWa… ▽ More Odometry is a crucial component for successfully implementing autonomous navigation, relying on sensors such as cameras, LiDARs and IMUs. However, these sensors may encounter challenges in extreme weather conditions, such as snowfall and fog. The emergence of FMCW radar technology offers the potential for robust perception in adverse conditions. As the latest generation of FWCW radars, the 4D mmWave radar provides point cloud with range, azimuth, elevation, and Doppler velocity information, despite inherent sparsity and noises in the point cloud. In this paper, we propose EFEAR-4D, an accurate, highly efficient, and learning-free method for large-scale 4D radar odometry estimation. EFEAR-4D exploits Doppler velocity information delicately for robust ego-velocity estimation, resulting in a highly accurate prior guess. EFEAR-4D maintains robustness against point-cloud sparsity and noises across diverse environments through dynamic object removal and effective region-wise feature extraction. Extensive experiments on two publicly available 4D radar datasets demonstrate state-of-the-art reliability and localization accuracy of EFEAR-4D under various conditions. Furthermore, we have collected a dataset following the same route but varying installation heights of the 4D radar, emphasizing the significant impact of radar height on point cloud quality - a crucial consideration for real-world deployments. Our algorithm and dataset will be available soon at https://github.com/CLASS-Lab/EFEAR-4D. △ Less

Submitted 15 May, 2024; originally announced May 2024.

arXiv:2405.09451 [pdf, other]

Exotic charge density waves and superconductivity on the Kagome Lattice

Authors: Rui-Qing Fu, Jun Zhan, Matteo Dürrnagel, Hendrik Hohmann, Ronny Thomale, Jiang** Hu, Ziqiang Wang, Sen Zhou, Xianxin Wu

Abstract: Recent experiments have identified fascinating electronic orders in kagome materials, including intriguing superconductivity, charge density wave (CDW) and nematicity. In particular, some experimental evidence for AV$_3$Sb$_5$ (A = K,Rb,Cs) and related kagome metals hints at the formation of orbital currents in the charge density wave ordered regime, providing a mechanism for spontaneous time-reve… ▽ More Recent experiments have identified fascinating electronic orders in kagome materials, including intriguing superconductivity, charge density wave (CDW) and nematicity. In particular, some experimental evidence for AV$_3$Sb$_5$ (A = K,Rb,Cs) and related kagome metals hints at the formation of orbital currents in the charge density wave ordered regime, providing a mechanism for spontaneous time-reversal symmetry breaking in the absence of local moments. In this work, we comprehensively explore the competitive charge instabilities of the spinless kagome lattice with inter-site Coulomb interactions at the pure-sublattice van Hove filling. From the analysis of the charge susceptibility, we find that, at the nesting vectors, while the onsite charge order is dramatically suppressed, the bond charge orders are substantially enhanced owing to the sublattice texture on the hexagonal Fermi surface. Furthermore, we demonstrate that nearest-neighbor and next nearest-neighbor bonds are characterized by significant intrinsic real and imaginary bond fluctuations, respectively. The 2$\times$2 loop current order is thus favored by the next nearest-neighbor Coulomb repulsion. Interestingly, increasing interactions further leads to a nematic state with intra-cell sublattice density modulation that breaks the $C_6$ rotational symmetry. We further explore superconducting orders descending from onsite and bond charge fluctuations, and discuss our model's implications on the experimental status quo. △ Less

Submitted 15 May, 2024; originally announced May 2024.

arXiv:2405.09066 [pdf, other]

Search for the leptonic decays $D^{*+}\to e^+ν_e$ and $D^{*+}\to μ^+ν_μ$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, M. Albrecht, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, R. Baldini Ferroli, I. Balossino, Y. Ban, V. Batozskaya, D. Becker, K. Begzsuren, N. Berger, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, J. Bloms, A. Bortone, I. Boyko , et al. (559 additional authors not shown)

Abstract: We present the first search for the leptonic decays $D^{*+}\to e^+ν_e$ and $D^{*+}\to μ^+ν_μ$ by analyzing a data sample of electron-positron collisions recorded with the BESIII detector at center-of-mass energies between 4.178 and 4.226 GeV, corresponding to an integrated luminosity of 6.32~fb$^{-1}$. No significant signal is observed. The upper limits on the branching fractions for… ▽ More We present the first search for the leptonic decays $D^{*+}\to e^+ν_e$ and $D^{*+}\to μ^+ν_μ$ by analyzing a data sample of electron-positron collisions recorded with the BESIII detector at center-of-mass energies between 4.178 and 4.226 GeV, corresponding to an integrated luminosity of 6.32~fb$^{-1}$. No significant signal is observed. The upper limits on the branching fractions for $D^{*+}\to e^+ν_e$ and $D^{*+}\to μ^+ν_μ$ are set to be $1.1 \times 10^{-5}$ and $4.3 \times 10^{-6}$ at 90\% confidence level, respectively. △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: 14 pages, 7 figures

arXiv:2405.08816 [pdf, other]

The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition

Authors: Lingdong Kong, Shaoyuan Xie, Hanjiang Hu, Yaru Niu, Wei Tsang Ooi, Benoit R. Cottereau, Lai Xing Ng, Yuexin Ma, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu, Weichao Qiu, Wei Zhang, Xu Cao, Hao Lu, Ying-Cong Chen, Caixin Kang, Xinning Zhou, Chengyang Ying, Wentao Shang, Xingxing Wei, Yinpeng Dong, Bo Yang, Shengyin Jiang , et al. (66 additional authors not shown)

Abstract: In the realm of autonomous driving, robust perception under out-of-distribution conditions is paramount for the safe deployment of vehicles. Challenges such as adverse weather, sensor malfunctions, and environmental unpredictability can severely impact the performance of autonomous systems. The 2024 RoboDrive Challenge was crafted to propel the development of driving perception technologies that c… ▽ More In the realm of autonomous driving, robust perception under out-of-distribution conditions is paramount for the safe deployment of vehicles. Challenges such as adverse weather, sensor malfunctions, and environmental unpredictability can severely impact the performance of autonomous systems. The 2024 RoboDrive Challenge was crafted to propel the development of driving perception technologies that can withstand and adapt to these real-world variabilities. Focusing on four pivotal tasks -- BEV detection, map segmentation, semantic occupancy prediction, and multi-view depth estimation -- the competition laid down a gauntlet to innovate and enhance system resilience against typical and atypical disturbances. This year's challenge consisted of five distinct tracks and attracted 140 registered teams from 93 institutes across 11 countries, resulting in nearly one thousand submissions evaluated through our servers. The competition culminated in 15 top-performing solutions, which introduced a range of innovative approaches including advanced data augmentation, multi-sensor fusion, self-supervised learning for error correction, and new algorithmic strategies to enhance sensor robustness. These contributions significantly advanced the state of the art, particularly in handling sensor inconsistencies and environmental variability. Participants, through collaborative efforts, pushed the boundaries of current technologies, showcasing their potential in real-world scenarios. Extensive evaluations and analyses provided insights into the effectiveness of these solutions, highlighting key trends and successful strategies for improving the resilience of driving perception systems. This challenge has set a new benchmark in the field, providing a rich repository of techniques expected to guide future research in this field. △ Less

Submitted 29 May, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

Comments: ICRA 2024; 32 pages, 24 figures, 5 tables; Code at https://robodrive-24.github.io/

arXiv:2405.08251 [pdf, other]

Multimodal Collaboration Networks for Geospatial Vehicle Detection in Dense, Occluded, and Large-Scale Events

Authors: Xin Wu, Zhanchao Huang, Li Wang, Jocelyn Chanussot, Jiaojiao Tian

Abstract: In large-scale disaster events, the planning of optimal rescue routes depends on the object detection ability at the disaster scene, with one of the main challenges being the presence of dense and occluded objects. Existing methods, which are typically based on the RGB modality, struggle to distinguish targets with similar colors and textures in crowded environments and are unable to identify obsc… ▽ More In large-scale disaster events, the planning of optimal rescue routes depends on the object detection ability at the disaster scene, with one of the main challenges being the presence of dense and occluded objects. Existing methods, which are typically based on the RGB modality, struggle to distinguish targets with similar colors and textures in crowded environments and are unable to identify obscured objects. To this end, we first construct two multimodal dense and occlusion vehicle detection datasets for large-scale events, utilizing RGB and height map modalities. Based on these datasets, we propose a multimodal collaboration network for dense and occluded vehicle detection, MuDet for short. MuDet hierarchically enhances the completeness of discriminable information within and across modalities and differentiates between simple and complex samples. MuDet includes three main modules: Unimodal Feature Hierarchical Enhancement (Uni-Enh), Multimodal Cross Learning (Mul-Lea), and Hard-easy Discriminative (He-Dis) Pattern. Uni-Enh and Mul-Lea enhance the features within each modality and facilitate the cross-integration of features from two heterogeneous modalities. He-Dis effectively separates densely occluded vehicle targets with significant intra-class differences and minimal inter-class differences by defining and thresholding confidence values, thereby suppressing the complex background. Experimental results on two re-labeled multimodal benchmark datasets, the 4K-SAI-LCS dataset, and the ISPRS Potsdam dataset, demonstrate the robustness and generalization of the MuDet. The codes of this work are available openly at \url{https://github.com/Shank2358/MuDet}. △ Less

Submitted 13 May, 2024; originally announced May 2024.

arXiv:2405.08091 [pdf, other]

doi 10.1038/s41467-024-48119-1

Cavity-enhanced photon indistinguishability at room temperature and telecom wavelengths

Authors: Lukas Husel, Julian Trapp, Johannes Scherzer, Xiaojian Wu, Peng Wang, Jacob Fortner, Manuel Nutz, Thomas Hümmer, Borislav Polovnikov, Michael Förg, David Hunger, YuHuang Wang, Alexander Högele

Abstract: Indistinguishable single photons in the telecom-bandwidth of optical fibers are indispensable for long-distance quantum communication. Solid-state single photon emitters have achieved excellent performance in key benchmarks, however, the demonstration of indistinguishability at room-temperature remains a major challenge. Here, we report room-temperature photon indistinguishability at telecom wavel… ▽ More Indistinguishable single photons in the telecom-bandwidth of optical fibers are indispensable for long-distance quantum communication. Solid-state single photon emitters have achieved excellent performance in key benchmarks, however, the demonstration of indistinguishability at room-temperature remains a major challenge. Here, we report room-temperature photon indistinguishability at telecom wavelengths from individual nanotube defects in a fiber-based microcavity operated in the regime of incoherent good cavity-coupling. The efficiency of the coupled system outperforms spectral or temporal filtering, and the photon indistinguishability is increased by more than two orders of magnitude compared to the free-space limit. Our results highlight a promising strategy to attain optimized non-classical light sources. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Journal ref: Nature Communications 15, 3989 (2024)

arXiv:2405.07919 [pdf, other]

Exploring the Low-Pass Filtering Behavior in Image Super-Resolution

Authors: Haoyu Deng, Zi**g Xu, Yule Duan, Xiao Wu, Wenjie Shu, Liang-Jian Deng

Abstract: Deep neural networks for image super-resolution (ISR) have shown significant advantages over traditional approaches like the interpolation. However, they are often criticized as 'black boxes' compared to traditional approaches with solid mathematical foundations. In this paper, we attempt to interpret the behavior of deep neural networks in ISR using theories from the field of signal processing. F… ▽ More Deep neural networks for image super-resolution (ISR) have shown significant advantages over traditional approaches like the interpolation. However, they are often criticized as 'black boxes' compared to traditional approaches with solid mathematical foundations. In this paper, we attempt to interpret the behavior of deep neural networks in ISR using theories from the field of signal processing. First, we report an intriguing phenomenon, referred to as `the sinc phenomenon.' It occurs when an impulse input is fed to a neural network. Then, building on this observation, we propose a method named Hybrid Response Analysis (HyRA) to analyze the behavior of neural networks in ISR tasks. Specifically, HyRA decomposes a neural network into a parallel connection of a linear system and a non-linear system and demonstrates that the linear system functions as a low-pass filter while the non-linear system injects high-frequency information. Finally, to quantify the injected high-frequency information, we introduce a metric for image-to-image tasks called Frequency Spectrum Distribution Similarity (FSDS). FSDS reflects the distribution similarity of different frequency components and can capture nuances that traditional metrics may overlook. Code, videos and raw experimental results for this paper can be found in: https://github.com/RisingEntropy/LPFInISR. △ Less

Submitted 23 May, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

Comments: Accepted by ICML 2024

arXiv:2405.07845 [pdf, other]

Multi-Task Learning for Fatigue Detection and Face Recognition of Drivers via Tree-Style Space-Channel Attention Fusion Network

Authors: Shulei Qu, Zhenguo Gao, Xiaowei Chen, Na Li, Yakai Wang, Xiaoxiao Wu

Abstract: In driving scenarios, automobile active safety systems are increasingly incorporating deep learning technology. These systems typically need to handle multiple tasks simultaneously, such as detecting fatigue driving and recognizing the driver's identity. However, the traditional parallel-style approach of combining multiple single-task models tends to waste resources when dealing with similar task… ▽ More In driving scenarios, automobile active safety systems are increasingly incorporating deep learning technology. These systems typically need to handle multiple tasks simultaneously, such as detecting fatigue driving and recognizing the driver's identity. However, the traditional parallel-style approach of combining multiple single-task models tends to waste resources when dealing with similar tasks. Therefore, we propose a novel tree-style multi-task modeling approach for multi-task learning, which rooted at a shared backbone, more dedicated separate module branches are appended as the model pipeline goes deeper. Following the tree-style approach, we propose a multi-task learning model for simultaneously performing driver fatigue detection and face recognition for identifying a driver. This model shares a common feature extraction backbone module, with further separated feature extraction and classification module branches. The dedicated branches exploit and combine spatial and channel attention mechanisms to generate space-channel fused-attention enhanced features, leading to improved detection performance. As only single-task datasets are available, we introduce techniques including alternating updation and gradient accumulation for training our multi-task model using only the single-task datasets. The effectiveness of our tree-style multi-task learning model is verified through extensive validations. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Showing 101–150 of 4,607 results for author: Wu, X