Search | arXiv e-print repository

A Unified Framework for 3D Scene Understanding

Authors: Wei Xu, Chunsheng Shi, Sifan Tu, Xin Zhou, Dingkang Liang, Xiang Bai

Abstract: We propose UniSeg3D, a unified 3D segmentation framework that achieves panoptic, semantic, instance, interactive, referring, and open-vocabulary semantic segmentation tasks within a single model. Most previous 3D segmentation approaches are specialized for a specific task, thereby limiting their understanding of 3D scenes to a task-specific perspective. In contrast, the proposed method unifies six… ▽ More We propose UniSeg3D, a unified 3D segmentation framework that achieves panoptic, semantic, instance, interactive, referring, and open-vocabulary semantic segmentation tasks within a single model. Most previous 3D segmentation approaches are specialized for a specific task, thereby limiting their understanding of 3D scenes to a task-specific perspective. In contrast, the proposed method unifies six tasks into unified representations processed by the same Transformer. It facilitates inter-task knowledge sharing and, therefore, promotes comprehensive 3D scene understanding. To take advantage of multi-task unification, we enhance the performance by leveraging task connections. Specifically, we design a knowledge distillation method and a contrastive learning method to transfer task-specific knowledge across different tasks. Benefiting from extensive inter-task knowledge sharing, our UniSeg3D becomes more powerful. Experiments on three benchmarks, including the ScanNet20, ScanRefer, and ScanNet200, demonstrate that the UniSeg3D consistently outperforms current SOTA methods, even those specialized for individual tasks. We hope UniSeg3D can serve as a solid unified baseline and inspire future work. The code will be available at https://dk-liang.github.io/UniSeg3D/. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: The code will be available at https://dk-liang.github.io/UniSeg3D/

arXiv:2407.01351 [pdf, other]

Probing the connection between IceCube neutrinos and MOJAVE AGN

Authors: R. Abbasi, M. Ackermann, J. Adams, S. K. Agarwalla, J. A. Aguilar, M. Ahlers, J. M. Alameddine, N. M. Amin, K. Andeen, C. Argüelles, Y. Ashida, S. Athanasiadou, L. Ausborm, S. N. Axani, X. Bai, A. Balagopal V., M. Baricevic, S. W. Barwick, S. Bash, V. Basu, R. Bay, J. J. Beatty, J. Becker Tjus, J. Beise, C. Bellenghi , et al. (399 additional authors not shown)

Abstract: Active Galactic Nuclei (AGN) are prime candidate sources of the high-energy, astrophysical neutrinos detected by IceCube. This is demonstrated by the real-time multi-messenger detection of the blazar TXS 0506+056 and the recent evidence of neutrino emission from NGC 1068 from a separate time-averaged study. However, the production mechanism of the astrophysical neutrinos in AGN is not well establi… ▽ More Active Galactic Nuclei (AGN) are prime candidate sources of the high-energy, astrophysical neutrinos detected by IceCube. This is demonstrated by the real-time multi-messenger detection of the blazar TXS 0506+056 and the recent evidence of neutrino emission from NGC 1068 from a separate time-averaged study. However, the production mechanism of the astrophysical neutrinos in AGN is not well established which can be resolved via correlation studies with photon observations. For neutrinos produced due to photohadronic interactions in AGN, in addition to a correlation of neutrinos with high-energy photons, there would also be a correlation of neutrinos with photons emitted at radio wavelengths. In this work, we perform an in-depth stacking study of the correlation between 15 GHz radio observations of AGN reported in the MOJAVE XV catalog, and ten years of neutrino data from IceCube. We also use a time-dependent approach which improves the statistical power of the stacking analysis. No significant correlation was found for both analyses and upper limits are reported. When compared to the IceCube diffuse flux, at 100 TeV and for a spectral index of 2.5, the upper limits derived are $\sim3\%$ and $\sim9\%$ for the time-averaged and time-dependent case, respectively. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: 14 Pages 7 Figures

arXiv:2407.01314 [pdf, other]

Search for a light sterile neutrino with 7.5 years of IceCube DeepCore data

Authors: R. Abbasi, M. Ackermann, J. Adams, S. K. Agarwalla, J. A. Aguilar, M. Ahlers, J. M. Alameddine, N. M. Amin, K. Andeen, C. Argüelles, Y. Ashida, S. Athanasiadou, L. Ausborm, S. N. Axani, X. Bai, A. Balagopal V., M. Baricevic, S. W. Barwick, S. Bash, V. Basu, R. Bay, J. J. Beatty, J. Becker Tjus, J. Beise, C. Bellenghi , et al. (399 additional authors not shown)

Abstract: We present a search for an eV-scale sterile neutrino using 7.5 years of data from the IceCube DeepCore detector. The analysis uses a sample of 21,914 events with energies between 5 and 150 GeV to search for sterile neutrinos through atmospheric muon neutrino disappearance. Improvements in event selection and treatment of systematic uncertainties provide greater statistical power compared to previo… ▽ More We present a search for an eV-scale sterile neutrino using 7.5 years of data from the IceCube DeepCore detector. The analysis uses a sample of 21,914 events with energies between 5 and 150 GeV to search for sterile neutrinos through atmospheric muon neutrino disappearance. Improvements in event selection and treatment of systematic uncertainties provide greater statistical power compared to previous DeepCore sterile neutrino searches. Our results are compatible with the absence of mixing between active and sterile neutrino states, and we place constraints on the mixing matrix elements $|U_{μ4}|^2 < 0.0534$ and $|U_{τ4}|^2 < 0.0574$ at 90% CL under the assumption that $Δm^2_{41}\geq 1\;\mathrm{eV^2}$. These null results add to the growing tension between anomalous appearance results and constraints from disappearance searches in the 3+1 sterile neutrino landscape. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: 11 pages, 5 figures. To be submitted to Physical Review D

arXiv:2407.01016 [pdf, other]

SOOD++: Leveraging Unlabeled Data to Boost Oriented Object Detection

Authors: Dingkang Liang, Wei Hua, Chunsheng Shi, Zhikang Zou, Xiaoqing Ye, Xiang Bai

Abstract: Semi-supervised object detection (SSOD), leveraging unlabeled data to boost object detectors, has become a hot topic recently. However, existing SSOD approaches mainly focus on horizontal objects, leaving multi-oriented objects common in aerial images unexplored. At the same time, the annotation cost of multi-oriented objects is significantly higher than that of their horizontal counterparts. Ther… ▽ More Semi-supervised object detection (SSOD), leveraging unlabeled data to boost object detectors, has become a hot topic recently. However, existing SSOD approaches mainly focus on horizontal objects, leaving multi-oriented objects common in aerial images unexplored. At the same time, the annotation cost of multi-oriented objects is significantly higher than that of their horizontal counterparts. Therefore, in this paper, we propose a simple yet effective Semi-supervised Oriented Object Detection method termed SOOD++. Specifically, we observe that objects from aerial images are usually arbitrary orientations, small scales, and aggregation, which inspires the following core designs: a Simple Instance-aware Dense Sampling (SIDS) strategy is used to generate comprehensive dense pseudo-labels; the Geometry-aware Adaptive Weighting (GAW) loss dynamically modulates the importance of each pair between pseudo-label and corresponding prediction by leveraging the intricate geometric information of aerial objects; we treat aerial images as global layouts and explicitly build the many-to-many relationship between the sets of pseudo-labels and predictions via the proposed Noise-driven Global Consistency (NGC). Extensive experiments conducted on various multi-oriented object datasets under various labeled settings demonstrate the effectiveness of our method. For example, on the DOTA-V1.5 benchmark, the proposed method outperforms previous state-of-the-art (SOTA) by a large margin (+2.92, +2.39, and +2.57 mAP under 10%, 20%, and 30% labeled data settings, respectively) with single-scale training and testing. More importantly, it still improves upon a strong supervised baseline with 70.66 mAP, trained using the full DOTA-V1.5 train-val set, by +1.82 mAP, resulting in a 72.48 mAP, pushing the new state-of-the-art. The code will be made available. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2407.00788 [pdf, other]

InstantStyle-Plus: Style Transfer with Content-Preserving in Text-to-Image Generation

Authors: Haofan Wang, Peng Xing, Renyuan Huang, Hao Ai, Qixun Wang, Xu Bai

Abstract: Style transfer is an inventive process designed to create an image that maintains the essence of the original while embracing the visual style of another. Although diffusion models have demonstrated impressive generative power in personalized subject-driven or style-driven applications, existing state-of-the-art methods still encounter difficulties in achieving a seamless balance between content p… ▽ More Style transfer is an inventive process designed to create an image that maintains the essence of the original while embracing the visual style of another. Although diffusion models have demonstrated impressive generative power in personalized subject-driven or style-driven applications, existing state-of-the-art methods still encounter difficulties in achieving a seamless balance between content preservation and style enhancement. For example, amplifying the style's influence can often undermine the structural integrity of the content. To address these challenges, we deconstruct the style transfer task into three core elements: 1) Style, focusing on the image's aesthetic characteristics; 2) Spatial Structure, concerning the geometric arrangement and composition of visual elements; and 3) Semantic Content, which captures the conceptual meaning of the image. Guided by these principles, we introduce InstantStyle-Plus, an approach that prioritizes the integrity of the original content while seamlessly integrating the target style. Specifically, our method accomplishes style injection through an efficient, lightweight process, utilizing the cutting-edge InstantStyle framework. To reinforce the content preservation, we initiate the process with an inverted content latent noise and a versatile plug-and-play tile ControlNet for preserving the original image's intrinsic layout. We also incorporate a global semantic adapter to enhance the semantic content's fidelity. To safeguard against the dilution of style information, a style extractor is employed as discriminator for providing supplementary style guidance. Codes will be available at https://github.com/instantX-research/InstantStyle-Plus. △ Less

Submitted 30 June, 2024; originally announced July 2024.

Comments: Technical Report

arXiv:2407.00136 [pdf, other]

Observation of the Electromagnetic Dalitz Transition $h_c \rightarrow e^+e^-η_c$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, S. Ahmed, M. Albrecht, R. Aliberti, A. Amoroso, M. R. An, Q. An, X. H. Bai, Y. Bai, O. Bakina, R. Baldini Ferroli, I. Balossino, Y. Ban, K. Begzsuren, N. Berger, M. Bertani, D. Bettoni, F. Bianchi, J. Bloms, A. Bortone, I. Boyko, R. A. Briere , et al. (495 additional authors not shown)

Abstract: Using $(27.12\pm 0.14)\times10^8$ $ψ(3686)$ decays and data samples of $e^+e^-$ collisions with $\sqrt{s}$ from 4.130 to 4.780~GeV collected with the BESIII detector, we report the first observation of the electromagnetic Dalitz transition $h_c\to e^+e^-η_c$ with a statistical significance of $5.4σ$. We measure the ratio of the branching fractions… ▽ More Using $(27.12\pm 0.14)\times10^8$ $ψ(3686)$ decays and data samples of $e^+e^-$ collisions with $\sqrt{s}$ from 4.130 to 4.780~GeV collected with the BESIII detector, we report the first observation of the electromagnetic Dalitz transition $h_c\to e^+e^-η_c$ with a statistical significance of $5.4σ$. We measure the ratio of the branching fractions $\frac{\mathcal{B}(h_c\rightarrow e^+e^-η_c)}{\mathcal{B}(h_c\rightarrow γη_c)}$ separately for the $h_c$ samples produced via $ψ(3686)\toπ^0h_c$ and $e^+e^-\toπ^+π^-h_c$. The average ratio is determined to be $(0.59\pm0.10(\text{stat.})\pm0.04(\text{syst.}))\%$, where the uncertainty includes both statistical and systematic components. △ Less

Submitted 2 July, 2024; v1 submitted 28 June, 2024; originally announced July 2024.

arXiv:2406.13157 [pdf, other]

Genetics-based deperturbation analysis for the spin-orbit coupled ${\rm A}^1Σ^+$ and ${\rm b}^3Π_{0^+}$ states of LiRb

Authors: Yide Yin, Xuhui Bai, Xuechun Li, Xin-Yu Luo, Jie Yu, Gaoren Wang, Yongchang Han

Abstract: We present a deperturbation analysis of the spin-orbit coupled $\rm A^1Σ^+$ and $\rm b^3Π_{0^+}$ states of LiRb based on the rovibrational energy levels observed previously by photoassociation spectroscopy in bosonic $^7$Li$^{85}$Rb molecule. Using the genetic algorithm, we fit the potential energy curves of the $\rm A^1Σ^+$ state and the $\rm b^3Π$ state into point-wise form. We then fit these po… ▽ More We present a deperturbation analysis of the spin-orbit coupled $\rm A^1Σ^+$ and $\rm b^3Π_{0^+}$ states of LiRb based on the rovibrational energy levels observed previously by photoassociation spectroscopy in bosonic $^7$Li$^{85}$Rb molecule. Using the genetic algorithm, we fit the potential energy curves of the $\rm A^1Σ^+$ state and the $\rm b^3Π$ state into point-wise form. We then fit these point-wise potentials along with the spin-orbit coupling into expanded Morse oscillator functional form and optimise analytical parameters based on the experimental data. From the fitted results, we calculate the transition dipole moment matrix elements for transitions from the rovibrational levels of the coupled $\rm A^1Σ^+$-$\rm b^3Π_{0^+}$ state to the Feshbach state and the absolute rovibrational ground state for fermionic $^6$Li$^{87}$Rb molecule. Based on the calculated transition dipole moment matrix elements, several levels of the coupled $\rm A^1Σ^+$-$\rm b^3Π_{0^+}$ state are predicted to be suitable as the intermediate state for stimulated Raman adiabatic passage transfer from the Feshbach state to the absolute rovibrational ground state. In addition, we also provide a similar estimation for ${\rm B}^1Π$-${\rm c}^3Σ_1^+$-${\rm b}^3Π_1$ state based on available $ab\ initio$ interaction potentials. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 12 pages, 9 figures

arXiv:2406.11191 [pdf, other]

A Survey on Human Preference Learning for Large Language Models

Authors: Ruili Jiang, Kehai Chen, Xuefeng Bai, Zhixuan He, Juntao Li, Muyun Yang, Tiejun Zhao, Liqiang Nie, Min Zhang

Abstract: The recent surge of versatile large language models (LLMs) largely depends on aligning increasingly capable foundation models with human intentions by preference learning, enhancing LLMs with excellent applicability and effectiveness in a wide range of contexts. Despite the numerous related studies conducted, a perspective on how human preferences are introduced into LLMs remains limited, which ma… ▽ More The recent surge of versatile large language models (LLMs) largely depends on aligning increasingly capable foundation models with human intentions by preference learning, enhancing LLMs with excellent applicability and effectiveness in a wide range of contexts. Despite the numerous related studies conducted, a perspective on how human preferences are introduced into LLMs remains limited, which may prevent a deeper comprehension of the relationships between human preferences and LLMs as well as the realization of their limitations. In this survey, we review the progress in exploring human preference learning for LLMs from a preference-centered perspective, covering the sources and formats of preference feedback, the modeling and usage of preference signals, as well as the evaluation of the aligned LLMs. We first categorize the human feedback according to data sources and formats. We then summarize techniques for human preferences modeling and compare the advantages and disadvantages of different schools of models. Moreover, we present various preference usage methods sorted by the objectives to utilize human preference signals. Finally, we summarize some prevailing approaches to evaluate LLMs in terms of alignment with human intentions and discuss our outlooks on the human intention alignment for LLMs. △ Less

Submitted 18 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

Comments: IEEE copyright statement added (also applied to the former version)

arXiv:2406.08698 [pdf, other]

Constraints on Ultra Heavy Dark Matter Properties from Dwarf Spheroidal Galaxies with LHAASO Observations

Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (255 additional authors not shown)

Abstract: In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes… ▽ More In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes of astrophysical $γ$-ray background while large amount of dark matter. By analyzing more than 700 days observational data at LHAASO, no significant dark matter signal from 1 TeV to 1 EeV is detected. Accordingly we derive the most stringent constraints on the ultra-heavy dark matter annihilation cross-section up to EeV. The constraints on the lifetime of dark matter in decay mode are also derived. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 17 pages, 12 figures, accepted by PRL

arXiv:2406.08135 [pdf]

Design, modeling, and characteristics of ringshaped robot actuated by functional fluid

Authors: Zebing Mao, Xuehang Bai, Yanhong Peng, Yayi Shen

Abstract: The controlled actuation of hydraulic and pneumatic actuators has unveiled fresh and thrilling opportunities for designing mobile robots with adaptable structures. Previously reported rolling robots, which were powered by fluidic systems, often relied on complex principles, cumbersome pump and valve systems, and intricate control strategies, limiting their applicability in other fields. In this in… ▽ More The controlled actuation of hydraulic and pneumatic actuators has unveiled fresh and thrilling opportunities for designing mobile robots with adaptable structures. Previously reported rolling robots, which were powered by fluidic systems, often relied on complex principles, cumbersome pump and valve systems, and intricate control strategies, limiting their applicability in other fields. In this investigation, we employed a distinct category of functional fluid identified as Electrohydrodynamic (EHD) fluid, serving as the pivotal element within the ring-shaped actuator. A short stream of functional fluid is placed within a fluidic channel and is then actuated by applying a direct current voltage aiming at shifting the center of mass of the robot and finally pushed the actuator to roll. We designed a ring-shaped fluidic robot, manufactured it using digital machining methods, and evaluated the robot's characteristics. Furthermore, we developed static and dynamic models to analyze the oscillation and rolling motion of the ring-shaped robots using the Lagrange method. This study is anticipated to contribute to the expansion of current research on EHD flexible actuators, enabling the realization of complex robotic systems. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.07689 [pdf, other]

Evidence for Non-zero Turbulence in the Protoplanetary disc around IM Lup

Authors: Kevin Flaherty, A. Meredith Hughes, Jacob B. Simon, Alicia Smith Reina, Chunhua Qi, Xue-Ning Bai, Sean M. Andrews, David J. Wilner, Agnes Kospal

Abstract: The amount of turbulence in protoplanetary discs around young stars is critical for determining the efficiency, timeline, and outcomes of planet formation. It is also difficult to measure. Observations are still limited, but direct measurements of the non-thermal, turbulent gas motion are possible with the Atacama Large Millimeter/submillimeter Array (ALMA). Using CO(2-1)/$^{13}$CO(2-1)/C$^{18}$O(… ▽ More The amount of turbulence in protoplanetary discs around young stars is critical for determining the efficiency, timeline, and outcomes of planet formation. It is also difficult to measure. Observations are still limited, but direct measurements of the non-thermal, turbulent gas motion are possible with the Atacama Large Millimeter/submillimeter Array (ALMA). Using CO(2-1)/$^{13}$CO(2-1)/C$^{18}$O(2-1) ALMA observations of the disc around IM Lup at ~0.4" (~60 au) resolution we find evidence of significant turbulence, at the level of $δv_{\rm turb}=(0.18-0.30)$c$_s$. This result is robust against systematic uncertainties (e.g., amplitude flux calibration, midplane gas temperature, disc self-gravity). We find that gravito-turbulence as the source of the gas motion is unlikely based on the lack of an imprint on the rotation curve from a massive disc, while magneto-rotational instabilities and hydrodynamic instabilities are still possible, depending on the unknown magnetic field strength and the cooling timescale in the outer disc. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: Accepted by MNRAS, 17 pages, 12 figures

arXiv:2406.07601 [pdf, other]

IceCube Search for Neutrino Emission from X-ray Bright Seyfert Galaxies

Authors: R. Abbasi, M. Ackermann, J. Adams, S. K. Agarwalla, J. A. Aguilar, M. Ahlers, J. M. Alameddine, N. M. Amin, K. Andeen, C. Argüelles, Y. Ashida, S. Athanasiadou, L. Ausborm, S. N. Axani, X. Bai, A. Balagopal V., M. Baricevic, S. W. Barwick, S. Bash, V. Basu, R. Bay, J. J. Beatty, J. Becker Tjus, J. Beise, C. Bellenghi , et al. (400 additional authors not shown)

Abstract: The recent IceCube detection of TeV neutrino emission from the nearby active galaxy NGC 1068 suggests that active galactic nuclei (AGN) could make a sizable contribution to the diffuse flux of astrophysical neutrinos. The absence of TeV $γ$-rays from NGC 1068 indicates neutrino production in the vicinity of the supermassive black hole, where the high radiation density leads to $γ$-ray attenuation.… ▽ More The recent IceCube detection of TeV neutrino emission from the nearby active galaxy NGC 1068 suggests that active galactic nuclei (AGN) could make a sizable contribution to the diffuse flux of astrophysical neutrinos. The absence of TeV $γ$-rays from NGC 1068 indicates neutrino production in the vicinity of the supermassive black hole, where the high radiation density leads to $γ$-ray attenuation. Therefore, any potential neutrino emission from similar sources is not expected to correlate with high-energy $γ$-rays. Disk-corona models predict neutrino emission from Seyfert galaxies to correlate with keV X-rays, as they are tracers of coronal activity. Using through-going track events from the Northern Sky recorded by IceCube between 2011 and 2021, we report results from a search for individual and aggregated neutrino signals from 27 additional Seyfert galaxies that are contained in the BAT AGN Spectroscopic Survey (BASS). Besides the generic single power-law, we evaluate the spectra predicted by the disk-corona model. Assuming all sources to be intrinsically similar to NGC 1068, our findings constrain the collective neutrino emission from X-ray bright Seyfert galaxies in the Northern Hemisphere, but, at the same time, show excesses of neutrinos that could be associated with the objects NGC 4151 and CGCG 420-015. These excesses result in a 2.7$σ$ significance with respect to background expectations. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: 17 pages, 9 figures

arXiv:2406.07232 [pdf, other]

DUAL-REFLECT: Enhancing Large Language Models for Reflective Translation through Dual Learning Feedback Mechanisms

Authors: Andong Chen, Lianzhang Lou, Kehai Chen, Xuefeng Bai, Yang Xiang, Muyun Yang, Tiejun Zhao, Min Zhang

Abstract: Recently, large language models (LLMs) enhanced by self-reflection have achieved promising performance on machine translation. The key idea is guiding LLMs to generate translation with human-like feedback. However, existing self-reflection methods lack effective feedback information, limiting the translation performance. To address this, we introduce a DUAL-REFLECT framework, leveraging the dual l… ▽ More Recently, large language models (LLMs) enhanced by self-reflection have achieved promising performance on machine translation. The key idea is guiding LLMs to generate translation with human-like feedback. However, existing self-reflection methods lack effective feedback information, limiting the translation performance. To address this, we introduce a DUAL-REFLECT framework, leveraging the dual learning of translation tasks to provide effective feedback, thereby enhancing the models' self-reflective abilities and improving translation performance. The application of this method across various translation tasks has proven its effectiveness in improving translation accuracy and eliminating ambiguities, especially in translation tasks with low-resource language pairs. △ Less

Submitted 21 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

Comments: Accepted to ACL 2024 main conference

arXiv:2406.07201 [pdf, ps, other]

Exact blow-up profiles for the parabolic-elliptic Keller-Segel system in dimensions $N\ge 3$

Authors: Xueli Bai, Maolin Zhou

Abstract: In this paper, we obtain the exact blow-up profiles of solutions of the Keller-Segel-Patlak system in the space with dimensions $N\ge 3$, which solves an open problem proposed by P. Souplet and M. Winkler in 2019. To establish this achievement, we develop the zero number argument for nonlinear equations with unbounded coefficients and construct a family of auxiliary backward self-similar solutions… ▽ More In this paper, we obtain the exact blow-up profiles of solutions of the Keller-Segel-Patlak system in the space with dimensions $N\ge 3$, which solves an open problem proposed by P. Souplet and M. Winkler in 2019. To establish this achievement, we develop the zero number argument for nonlinear equations with unbounded coefficients and construct a family of auxiliary backward self-similar solutions through nontrivial ODE analysis. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.07036 [pdf, other]

Paying More Attention to Source Context: Mitigating Unfaithful Translations from Large Language Model

Authors: Hongbin Zhang, Kehai Chen, Xuefeng Bai, Yang Xiang, Min Zhang

Abstract: Large language models (LLMs) have showcased impressive multilingual machine translation ability. However, unlike encoder-decoder style models, decoder-only LLMs lack an explicit alignment between source and target contexts. Analyzing contribution scores during generation processes revealed that LLMs can be biased towards previously generated tokens over corresponding source tokens, leading to unfa… ▽ More Large language models (LLMs) have showcased impressive multilingual machine translation ability. However, unlike encoder-decoder style models, decoder-only LLMs lack an explicit alignment between source and target contexts. Analyzing contribution scores during generation processes revealed that LLMs can be biased towards previously generated tokens over corresponding source tokens, leading to unfaithful translations. To address this issue, we propose to encourage LLMs to pay more attention to the source context from both source and target perspectives in zeroshot prompting: 1) adjust source context attention weights; 2) suppress irrelevant target prefix influence; Additionally, we propose 3) avoiding over-reliance on the target prefix in instruction tuning. Experimental results from both human-collected unfaithfulness test sets focusing on LLM-generated unfaithful translations and general test sets, verify our methods' effectiveness across multiple language pairs. Further human evaluation shows our method's efficacy in reducing hallucinatory translations and facilitating faithful translation generation. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: Accepted by ACL2024 Findings

arXiv:2406.06684 [pdf, other]

Search for neutrino emission from hard X-ray AGN with IceCube

Authors: R. Abbasi, M. Ackermann, J. Adams, S. K. Agarwalla, J. A. Aguilar, M. Ahlers, J. M. Alameddine, N. M. Amin, K. Andeen, C. Argüelles, Y. Ashida, S. Athanasiadou, L. Ausborm, S. N. Axani, X. Bai, A. Balagopal V., M. Baricevic, S. W. Barwick, S. Bash, V. Basu, R. Bay, J. J. Beatty, J. Becker Tjus, J. Beise, C. Bellenghi , et al. (401 additional authors not shown)

Abstract: Active Galactic Nuclei (AGN) are promising candidate sources of high-energy astrophysical neutrinos since they provide environments rich in matter and photon targets where cosmic ray interactions may lead to the production of gamma rays and neutrinos. We searched for high-energy neutrino emission from AGN using the $\textit{Swift}$-BAT Spectroscopic Survey (BASS) catalog of hard X-ray sources and… ▽ More Active Galactic Nuclei (AGN) are promising candidate sources of high-energy astrophysical neutrinos since they provide environments rich in matter and photon targets where cosmic ray interactions may lead to the production of gamma rays and neutrinos. We searched for high-energy neutrino emission from AGN using the $\textit{Swift}$-BAT Spectroscopic Survey (BASS) catalog of hard X-ray sources and 12 years of IceCube muon track data. First, upon performing a stacked search, no significant emission was found. Second, we searched for neutrinos from a list of 43 candidate sources and found an excess from the direction of two sources, Seyfert galaxies NGC 1068 and NGC 4151. We observed NGC 1068 at flux $φ_{ν_μ+\barν_μ}$ = $4.02_{-1.52}^{+1.58} \times 10^{-11}$ TeV$^{-1}$ cm$^{-2}$ s$^{-1}$ normalized at 1 TeV, with power-law spectral index, $γ$ = 3.10$^{+0.26}_{-0.22}$, consistent with previous IceCube results. The observation of a neutrino excess from the direction of NGC 4151 is at a post-trial significance of 2.9$σ$. If interpreted as an astrophysical signal, the excess observed from NGC 4151 corresponds to a flux $φ_{ν_μ+\barν_μ}$ = $1.51_{-0.81}^{+0.99} \times 10^{-11}$ TeV$^{-1}$ cm$^{-2}$ s$^{-1}$ normalized at 1 TeV and $γ$ = 2.83$^{+0.35}_{-0.28}$. △ Less

Submitted 12 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.04801 [pdf, other]

MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks

Authors: Xingkui Zhu, Yiran Guan, Dingkang Liang, Yuchao Chen, Yuliang Liu, Xiang Bai

Abstract: The sparsely activated mixture of experts (MoE) model presents a promising alternative to traditional densely activated (dense) models, enhancing both quality and computational efficiency. However, training MoE models from scratch demands extensive data and computational resources. Moreover, public repositories like timm mainly provide pre-trained dense checkpoints, lacking similar resources for M… ▽ More The sparsely activated mixture of experts (MoE) model presents a promising alternative to traditional densely activated (dense) models, enhancing both quality and computational efficiency. However, training MoE models from scratch demands extensive data and computational resources. Moreover, public repositories like timm mainly provide pre-trained dense checkpoints, lacking similar resources for MoE models, hindering their adoption. To bridge this gap, we introduce MoE Jetpack, an effective method for fine-tuning dense checkpoints into MoE models. MoE Jetpack incorporates two key techniques: (1) checkpoint recycling, which repurposes dense checkpoints as initial weights for MoE models, thereby accelerating convergence, enhancing accuracy, and alleviating the computational burden of pre-training; (2) hyperspherical adaptive MoE (SpheroMoE) layer, which optimizes the MoE architecture for better integration of dense checkpoints, enhancing fine-tuning performance. Our experiments on vision tasks demonstrate that MoE Jetpack significantly improves convergence speed and accuracy when fine-tuning dense checkpoints into MoE models. Our code will be publicly available at https://github.com/Adlith/MoE-Jetpack. △ Less

Submitted 7 June, 2024; originally announced June 2024.

Comments: 9 pages, 6 figures

ACM Class: I.2

arXiv:2406.03019 [pdf, other]

Puzzle Pieces Picker: Deciphering Ancient Chinese Characters with Radical Reconstruction

Authors: Pengjie Wang, Kaile Zhang, Xinyu Wang, Shengwei Han, Yongge Liu, Lianwen **, Xiang Bai, Yuliang Liu

Abstract: Oracle Bone Inscriptions is one of the oldest existing forms of writing in the world. However, due to the great antiquity of the era, a large number of Oracle Bone Inscriptions (OBI) remain undeciphered, making it one of the global challenges in the field of paleography today. This paper introduces a novel approach, namely Puzzle Pieces Picker (P$^3$), to decipher these enigmatic characters throug… ▽ More Oracle Bone Inscriptions is one of the oldest existing forms of writing in the world. However, due to the great antiquity of the era, a large number of Oracle Bone Inscriptions (OBI) remain undeciphered, making it one of the global challenges in the field of paleography today. This paper introduces a novel approach, namely Puzzle Pieces Picker (P$^3$), to decipher these enigmatic characters through radical reconstruction. We deconstruct OBI into foundational strokes and radicals, then employ a Transformer model to reconstruct them into their modern (conterpart)\textcolor{blue}{counterparts}, offering a groundbreaking solution to ancient script analysis. To further this endeavor, a new Ancient Chinese Character Puzzles (ACCP) dataset was developed, comprising an extensive collection of character images from seven key historical stages, annotated with detailed radical sequences. The experiments have showcased considerable promising insights, underscoring the potential and effectiveness of our approach in deciphering the intricacies of ancient Chinese scripts. Through this novel dataset and methodology, we aim to bridge the gap between traditional philology and modern document analysis techniques, offering new insights into the rich history of Chinese linguistic heritage. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: ICDAR 2024

arXiv:2406.01302 [pdf]

Pulmonary Embolism Mortality Prediction Using Multimodal Learning Based on Computed Tomography Angiography and Clinical Data

Authors: Zhusi Zhong, Helen Zhang, Fayez H. Fayad, Andrew C. Lancaster, John Sollee, Shreyas Kulkarni, Cheng Ting Lin, Jie Li, Xinbo Gao, Scott Collins, Colin Greineder, Sun H. Ahn, Harrison X. Bai, Zhicheng Jiao, Michael K. Atalay

Abstract: Purpose: Pulmonary embolism (PE) is a significant cause of mortality in the United States. The objective of this study is to implement deep learning (DL) models using Computed Tomography Pulmonary Angiography (CTPA), clinical data, and PE Severity Index (PESI) scores to predict PE mortality. Materials and Methods: 918 patients (median age 64 years, range 13-99 years, 52% female) with 3,978 CTPAs w… ▽ More Purpose: Pulmonary embolism (PE) is a significant cause of mortality in the United States. The objective of this study is to implement deep learning (DL) models using Computed Tomography Pulmonary Angiography (CTPA), clinical data, and PE Severity Index (PESI) scores to predict PE mortality. Materials and Methods: 918 patients (median age 64 years, range 13-99 years, 52% female) with 3,978 CTPAs were identified via retrospective review across three institutions. To predict survival, an AI model was used to extract disease-related imaging features from CTPAs. Imaging features and/or clinical variables were then incorporated into DL models to predict survival outcomes. Four models were developed as follows: (1) using CTPA imaging features only; (2) using clinical variables only; (3) multimodal, integrating both CTPA and clinical variables; and (4) multimodal fused with calculated PESI score. Performance and contribution from each modality were evaluated using concordance index (c-index) and Net Reclassification Improvement, respectively. Performance was compared to PESI predictions using the Wilcoxon signed-rank test. Kaplan-Meier analysis was performed to stratify patients into high- and low-risk groups. Additional factor-risk analysis was conducted to account for right ventricular (RV) dysfunction. Results: For both data sets, the PESI-fused and multimodal models achieved higher c-indices than PESI alone. Following stratification of patients into high- and low-risk groups by multimodal and PESI-fused models, mortality outcomes differed significantly (both p<0.001). A strong correlation was found between high-risk grou** and RV dysfunction. Conclusions: Multiomic DL models incorporating CTPA features, clinical data, and PESI achieved higher c-indices than PESI alone for PE survival prediction. △ Less

Submitted 5 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

arXiv:2406.00948 [pdf]

Real-space tilting method for atomic resolution STEM imaging of nanocrystalline materials

Authors: Jiake Wei, Zhangze Xu, Wenjie Shen, Bin Feng, Ryo Ishikawa, Naoya Shibata, Yuichi Ikuhara, Xuedong Bai

Abstract: Atomic-resolution scanning transmission electron microscopy (STEM) characterization requires precise tilting of the specimen to high symmetric zone axis, which is usually processed in reciprocal space by following the diffraction patterns. However, for small-sized nanocrystalline materials, their diffraction patterns are too faint to guide the tilting process. Here, a simple and effective tilting… ▽ More Atomic-resolution scanning transmission electron microscopy (STEM) characterization requires precise tilting of the specimen to high symmetric zone axis, which is usually processed in reciprocal space by following the diffraction patterns. However, for small-sized nanocrystalline materials, their diffraction patterns are too faint to guide the tilting process. Here, a simple and effective tilting method is developed based on the diffraction contrast change of the shadow image in the Ronchigram. We can calculate the misorientation angle of the specimen and tilt it to the zone axis based on the position of the shadow image with lowest intensity. This method requires no prior knowledge of the sample and the maximum misorientation angle we can correct is greater than +-6.9 degree with sub-mrad accuracy. It is processed in real space, without recording the diffraction patterns of the specimens, which can effectively apply to nanocrystalline materials. Combined with the scripting to control the microscope, we can automatically tilt the sample to the zone axis under low dose condition (<0.17 e-/A2/s), which could facilitate the imaging of beam sensitive materials such as zeolites or metal organic frameworks. This automated tilting method could contribute to the atomic-scale characterization of the nanocrystalline materials by STEM imaging. △ Less

Submitted 2 June, 2024; originally announced June 2024.

arXiv:2406.00905 [pdf, other]

Exploration of mass splitting and muon/tau mixing parameters for an eV-scale sterile neutrino with IceCube

Authors: R. Abbasi, M. Ackermann, J. Adams, S. K. Agarwalla, J. A. Aguilar, M. Ahlers, J. M. Alameddine, N. M. Amin, K. Andeen, C. Argüelles, Y. Ashida, S. Athanasiadou, L. Ausborm, S. N. Axani, X. Bai, A. Balagopal V., M. Baricevic, S. W. Barwick, S. Bash, V. Basu, R. Bay, J. J. Beatty, J. Becker Tjus, J. Beise, C. Bellenghi , et al. (400 additional authors not shown)

Abstract: We present the first three-parameter fit to a 3+1 sterile neutrino model using 7.634 years of data from the IceCube Neutrino Observatory on $ν_μ+\overlineν_μ$ charged-current interactions in the energy range 500-9976 GeV. Our analysis is sensitive to the mass-squared splitting between the heaviest and lightest mass state ($Δm_{41}^2$), the mixing matrix element connecting muon flavor to the fourth… ▽ More We present the first three-parameter fit to a 3+1 sterile neutrino model using 7.634 years of data from the IceCube Neutrino Observatory on $ν_μ+\overlineν_μ$ charged-current interactions in the energy range 500-9976 GeV. Our analysis is sensitive to the mass-squared splitting between the heaviest and lightest mass state ($Δm_{41}^2$), the mixing matrix element connecting muon flavor to the fourth mass state ($|U_{\mu4}|^2$), and the element connecting tau flavor to the fourth mass state ($|U_{\tau4}|^2$). Predicted propagation effects in matter enhance the signature through a resonance as atmospheric neutrinos from the Northern Hemisphere traverse the Earth to the IceCube detector at the South Pole. The result is consistent with the no-sterile neutrino hypothesis with a probability of 4.3 %. Profiling the likelihood of each parameter yields the 90 % confidence levels: $ 2.4\,\mathrm{eV}^{2} < Δm_{41}^2 <9.6\,\mathrm{eV}^{2} $ , $0.0081 < |U_{\mu4}|^2 < 0.10$ , and $|U_{\tau4}|^2< 0.035$, which narrows the allowed parameter-space for $|U_{\tau4}|^2$. However, the primary result of this analysis is the first map of the 3+1 parameter space exploring the interdependence of $Δm_{41}^2$, $|U_{\mu4}|^2$, and $|U_{\tau4}|^2$. △ Less

Submitted 2 June, 2024; originally announced June 2024.

arXiv:2406.00684 [pdf, other]

Deciphering Oracle Bone Language with Diffusion Models

Authors: Haisu Guan, Huanxin Yang, Xinyu Wang, Shengwei Han, Yongge Liu, Lianwen **, Xiang Bai, Yuliang Liu

Abstract: Originating from China's Shang Dynasty approximately 3,000 years ago, the Oracle Bone Script (OBS) is a cornerstone in the annals of linguistic history, predating many established writing systems. Despite the discovery of thousands of inscriptions, a vast expanse of OBS remains undeciphered, casting a veil of mystery over this ancient language. The emergence of modern AI technologies presents a no… ▽ More Originating from China's Shang Dynasty approximately 3,000 years ago, the Oracle Bone Script (OBS) is a cornerstone in the annals of linguistic history, predating many established writing systems. Despite the discovery of thousands of inscriptions, a vast expanse of OBS remains undeciphered, casting a veil of mystery over this ancient language. The emergence of modern AI technologies presents a novel frontier for OBS decipherment, challenging traditional NLP methods that rely heavily on large textual corpora, a luxury not afforded by historical languages. This paper introduces a novel approach by adopting image generation techniques, specifically through the development of Oracle Bone Script Decipher (OBSD). Utilizing a conditional diffusion-based strategy, OBSD generates vital clues for decipherment, charting a new course for AI-assisted analysis of ancient languages. To validate its efficacy, extensive experiments were conducted on an oracle bone script dataset, with quantitative results demonstrating the effectiveness of OBSD. Code and decipherment results will be made available at https://github.com/guanhaisu/OBSD. △ Less

Submitted 2 June, 2024; originally announced June 2024.

Comments: ACL2024 main conference long paper

arXiv:2405.18699 [pdf, ps, other]

Correction for the Weakening Magnetic Field within the Sunspot Umbra Observed by ASO-S/FMG

Authors: Haiqing Xu, Jiangtao Su, Suo Liu, Yuanyong Deng, Xianyong Bai, Jie Chen, Xiaofan Wang, Xiao Yang, Yongliang Song

Abstract: The magnetic field inside the sunspot umbra, as observed by the Full-disk MagnetoGraph (FMG) onboard the Advanced Space based Solar Observatory (ASO-S), was found to be experiencing a weakening. To address this issue, we employed a method developed by Xu et al. (2021) to correct the weakening in the data of 20 active regions observed by FMG during the period spanning December 29, 2022, to July 23,… ▽ More The magnetic field inside the sunspot umbra, as observed by the Full-disk MagnetoGraph (FMG) onboard the Advanced Space based Solar Observatory (ASO-S), was found to be experiencing a weakening. To address this issue, we employed a method developed by Xu et al. (2021) to correct the weakening in the data of 20 active regions observed by FMG during the period spanning December 29, 2022, to July 23, 2023. Research has revealed that the onset of magnetic field weakening occurs at a minimum magnetic field strength of 705 G, with the peak strength reaching up to 1931 G. We computed the change ratio (R1) of the unsigned magnetic flux within the sunspot umbra, considering measurements both before and after correction. The change ratio (R1) spans from 26% to 124%, indicating a significant increase in the unsigned magnetic flux within sunspot umbrae observed by FMG after correction. To illustrate this, we selected four active regions for comparison with data from the Helioseismic and Magnetic Imager (HMI). After correction, it is found that the unsigned magnetic flux in sunspot umbrae measured by FMG aligns more closely with that of HMI. This supports the effectiveness of the corrective method for FMG, despite imperfections, particularly at the umbra-penumbra boundary. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: 12 pages, 5 figures

arXiv:2405.16741 [pdf, other]

doi 10.1007/s11207-024-02311-0

A Study on Magnetic-sensitivity Wavelength Position of the Working Line Used by the Full-Disk Magnetograph onboard the Advanced Space based Solar Observatory (ASO-S/FMG)

Authors: S. Liu, J. T. Su, X. Y. Bai, Y. Y. Deng, J. Chen, Y. L. Song, X. F. Wang, H. Q. Xu, X. Yang, Shahid Idrees

Abstract: Utilizing data from the $Solar$ $Magnetism$ and $Activity$ $Telescope$ (SMAT), analytical solutions of polarized radiative transfer equations, and in-orbit test data from the Full-disk Magnetograph (FMG) onboard the Advanced Space based Solar Observatory (ASO-S), this study reveals the magnetic-sensitivity spectral positions for the Fe {\sc i} $λ$5234.19 A, working line used by FMG. From the exper… ▽ More Utilizing data from the $Solar$ $Magnetism$ and $Activity$ $Telescope$ (SMAT), analytical solutions of polarized radiative transfer equations, and in-orbit test data from the Full-disk Magnetograph (FMG) onboard the Advanced Space based Solar Observatory (ASO-S), this study reveals the magnetic-sensitivity spectral positions for the Fe {\sc i} $λ$5234.19 A, working line used by FMG. From the experimental data of SMAT, it is found that the most sensitivity position is located at the line center for linear polarization (Stokes-Q/U), while it is about -0.07 A away from the line center for circular polarization (Stokes-V). Moreover, both the theoretical analysis and the in-orbit test data analysis of FMG prove again the above results. Additionally, the theoretical analysis suggests the presence of distinct spectral pockets (centered at 0.08-0.15 A) from the line, harboring intense magnetic sensitivity across all three Stokes parameters. Striking a balance between high sensitivity for both linear and circular polarization while capturing additional valuable information, a spectral position of -0.08 A emerges as the champion for routine FMG magnetic-field observations. △ Less

Submitted 26 May, 2024; originally announced May 2024.

Comments: 12pages,8figures

Journal ref: Solar Physics, 2024,May

arXiv:2405.16038 [pdf, other]

Rethinking Early-Fusion Strategies for Improved Multispectral Object Detection

Authors: Xue Zhang, Si-Yuan Cao, Fang Wang, Runmin Zhang, Zhe Wu, Xiaohan Zhang, Xiaokai Bai, Hui-Liang Shen

Abstract: Most recent multispectral object detectors employ a two-branch structure to extract features from RGB and thermal images. While the two-branch structure achieves better performance than a single-branch structure, it overlooks inference efficiency. This conflict is increasingly aggressive, as recent works solely pursue higher performance rather than both performance and efficiency. In this paper, w… ▽ More Most recent multispectral object detectors employ a two-branch structure to extract features from RGB and thermal images. While the two-branch structure achieves better performance than a single-branch structure, it overlooks inference efficiency. This conflict is increasingly aggressive, as recent works solely pursue higher performance rather than both performance and efficiency. In this paper, we address this issue by improving the performance of efficient single-branch structures. We revisit the reasons causing the performance gap between these structures. For the first time, we reveal the information interference problem in the naive early-fusion strategy adopted by previous single-branch structures. Besides, we find that the domain gap between multispectral images, and weak feature representation of the single-branch structure are also key obstacles for performance. Focusing on these three problems, we propose corresponding solutions, including a novel shape-priority early-fusion strategy, a weakly supervised learning method, and a core knowledge distillation technique. Experiments demonstrate that single-branch networks equipped with these three contributions achieve significant performance enhancements while retaining high efficiency. Our code will be available at \url{https://github.com/XueZ-phd/Efficient-RGB-T-Early-Fusion-Detection}. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.14983 [pdf, other]

The Solar Origin of an Intense Geomagnetic Storm on 2023 December 1st: Successive Slip** and Eruption of Multiple Magnetic Flux Ropes

Authors: Zheng Sun, Ting Li, Yijun Hou, Hui Tian, Ziqi Wu, Ke Li, Yining Zhang, Zhentong Li, Xianyong Bai, Li Feng, Chuan Li, Zhenyong Hou, Qiao Song, **gsong Wang, Gui** Zhou

Abstract: The solar eruption that occurred on 2023 November 28 (SOL2023-11-28) triggered an intense geomagnetic storm on Earth on 2023 December 1. The associated Earth's auroras manifested at the most southern latitudes in the northern hemisphere observed in the past two decades. In order to explore the profound geoeffectiveness of this event, we conducted a comprehensive analysis of its solar origin to off… ▽ More The solar eruption that occurred on 2023 November 28 (SOL2023-11-28) triggered an intense geomagnetic storm on Earth on 2023 December 1. The associated Earth's auroras manifested at the most southern latitudes in the northern hemisphere observed in the past two decades. In order to explore the profound geoeffectiveness of this event, we conducted a comprehensive analysis of its solar origin to offer potential factors contributing to its impact. Magnetic flux ropes (MFRs) are twisted magnetic structures recognized as significant contributors to coronal mass ejections (CMEs), thereby impacting space weather greatly. In this event, we identified multiple MFRs in the solar active region and observed distinct slip** processes of the three MFRs: MFR1, MFR2, and MFR3. All three MFRs exhibit slip** motions at a speed of 40--137 km s$^{-1}$, extending beyond their original locations. Notably, the slip** of MFR2 extends to $\sim$30 Mm and initiate the eruption of MFR3. Ultimately, MFR1's eruption results in an M3.4-class flare and a CME, while MFR2 and MFR3 collectively produce an M9.8-class flare and another halo CME. This study shows the slip** process in a multi-MFR system, showing how one MFR's slip** can trigger the eruption of another MFR. We propose that the CME--CME interactions caused by multiple MFR eruptions may contribute to the significant geoeffectiveness. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.13874 [pdf, other]

Affine-based Deformable Attention and Selective Fusion for Semi-dense Matching

Authors: Hongkai Chen, Zixin Luo, Yurun Tian, Xuyang Bai, Ziyu Wang, Lei Zhou, Mingmin Zhen, Tian Fang, David McKinnon, Yanghai Tsin, Long Quan

Abstract: Identifying robust and accurate correspondences across images is a fundamental problem in computer vision that enables various downstream tasks. Recent semi-dense matching methods emphasize the effectiveness of fusing relevant cross-view information through Transformer. In this paper, we propose several improvements upon this paradigm. Firstly, we introduce affine-based local attention to model cr… ▽ More Identifying robust and accurate correspondences across images is a fundamental problem in computer vision that enables various downstream tasks. Recent semi-dense matching methods emphasize the effectiveness of fusing relevant cross-view information through Transformer. In this paper, we propose several improvements upon this paradigm. Firstly, we introduce affine-based local attention to model cross-view deformations. Secondly, we present selective fusion to merge local and global messages from cross attention. Apart from network structure, we also identify the importance of enforcing spatial smoothness in loss design, which has been omitted by previous works. Based on these augmentations, our network demonstrate strong matching capacity under different settings. The full version of our network achieves state-of-the-art performance among semi-dense matching methods at a similar cost to LoFTR, while the slim version reaches LoFTR baseline's performance with only 15% computation cost and 18% parameters. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: Accepted to CVPR2024 Image Matching Workshop

arXiv:2405.13311 [pdf, other]

Observation of a large-scale filament eruption initiated by two small-scale erupting filaments pushing out from below

Authors: Yongliang Song, Jiangtao Su, Qingmin Zhang, Mei Zhang, Yuanyong Deng, Xianyong Bai, Suo Liu, Xiao Yang, Jie Chen, Haiqing Xu, Kaifan Ji, Ziyao Hu

Abstract: Filament eruptions often result in flares and coronal mass ejections (CMEs). Most studies attribute the filament eruptions to their instabilities or magnetic reconnection. In this study, we report a unique observation of a filament eruption whose initiation process has not been reported before. This large-scale filament, with a length of about 360 Mm crossing an active region, is forced to erupted… ▽ More Filament eruptions often result in flares and coronal mass ejections (CMEs). Most studies attribute the filament eruptions to their instabilities or magnetic reconnection. In this study, we report a unique observation of a filament eruption whose initiation process has not been reported before. This large-scale filament, with a length of about 360 Mm crossing an active region, is forced to erupted by two small-scale erupting filaments pushing out from below. This process of multi-filament eruption results in an M6.4 flare in the active region NOAA 13229 on 25th February 2023. The whole process can be divided into three stages: the eruptions of two active-region filaments F1 and F2; the interactions between the erupting F1, F2, and the large-scale filament F3; and the eruption of F3. Though this multi-filament eruption occurs near the northwest limb of the solar disk, it produces a strong halo CME that causes a significant geomagnetic disturbance. Our observations present a new filament eruption mechanism, in which the initial kinetic energy of the eruption is obtained from and transported to by other erupting structures. This event provides us a unique insight into the dynamics of multi-filament eruptions and their corresponding effects on the interplanetary space. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: 16 pages, 10 figures. Accepted for publication in Solar Physics

arXiv:2405.12533 [pdf]

Dataset and Benchmark for Urdu Natural Scenes Text Detection, Recognition and Visual Question Answering

Authors: Hiba Maryam, Ling Fu, Jiajun Song, Tajrian ABM Shafayet, Qidi Luo, Xiang Bai, Yuliang Liu

Abstract: The development of Urdu scene text detection, recognition, and Visual Question Answering (VQA) technologies is crucial for advancing accessibility, information retrieval, and linguistic diversity in digital content, facilitating better understanding and interaction with Urdu-language visual data. This initiative seeks to bridge the gap between textual and visual comprehension. We propose a new mul… ▽ More The development of Urdu scene text detection, recognition, and Visual Question Answering (VQA) technologies is crucial for advancing accessibility, information retrieval, and linguistic diversity in digital content, facilitating better understanding and interaction with Urdu-language visual data. This initiative seeks to bridge the gap between textual and visual comprehension. We propose a new multi-task Urdu scene text dataset comprising over 1000 natural scene images, which can be used for text detection, recognition, and VQA tasks. We provide fine-grained annotations for text instances, addressing the limitations of previous datasets for facing arbitrary-shaped texts. By incorporating additional annotation points, this dataset facilitates the development and assessment of methods that can handle diverse text layouts, intricate shapes, and non-standard orientations commonly encountered in real-world scenarios. Besides, the VQA annotations make it the first benchmark for the Urdu Text VQA method, which can prompt the development of Urdu scene text understanding. The proposed dataset is available at: https://github.com/Hiba-MeiRuan/Urdu-VQA-Dataset-/tree/main △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: Accepted by the International Conference on Document Analysis and Recognition (ICDAR) 2024

arXiv:2405.12110 [pdf, other]

CoR-GS: Sparse-View 3D Gaussian Splatting via Co-Regularization

Authors: Jiawei Zhang, Jiahe Li, Xiaohan Yu, Lei Huang, Lin Gu, ** Zheng, Xiao Bai

Abstract: 3D Gaussian Splatting (3DGS) creates a radiance field consisting of 3D Gaussians to represent a scene. With sparse training views, 3DGS easily suffers from overfitting, negatively impacting the reconstruction quality. This paper introduces a new co-regularization perspective for improving sparse-view 3DGS. When training two 3D Gaussian radiance fields with the same sparse views of a scene, we obse… ▽ More 3D Gaussian Splatting (3DGS) creates a radiance field consisting of 3D Gaussians to represent a scene. With sparse training views, 3DGS easily suffers from overfitting, negatively impacting the reconstruction quality. This paper introduces a new co-regularization perspective for improving sparse-view 3DGS. When training two 3D Gaussian radiance fields with the same sparse views of a scene, we observe that the two radiance fields exhibit \textit{point disagreement} and \textit{rendering disagreement} that can unsupervisedly predict reconstruction quality, stemming from the sampling implementation in densification. We further quantify the point disagreement and rendering disagreement by evaluating the registration between Gaussians' point representations and calculating differences in their rendered pixels. The empirical study demonstrates the negative correlation between the two disagreements and accurate reconstruction, which allows us to identify inaccurate reconstruction without accessing ground-truth information. Based on the study, we propose CoR-GS, which identifies and suppresses inaccurate reconstruction based on the two disagreements: (\romannumeral1) Co-pruning considers Gaussians that exhibit high point disagreement in inaccurate positions and prunes them. (\romannumeral2) Pseudo-view co-regularization considers pixels that exhibit high rendering disagreement are inaccurately rendered and suppress the disagreement. Results on LLFF, Mip-NeRF360, DTU, and Blender demonstrate that CoR-GS effectively regularizes the scene geometry, reconstructs the compact representations, and achieves state-of-the-art novel view synthesis quality under sparse training views. △ Less

Submitted 20 May, 2024; originally announced May 2024.

Comments: Project page: https://jiaw-z.github.io/CoR-GS/

arXiv:2405.11985 [pdf, other]

MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering

Authors: **gqun Tang, Qi Liu, Yongjie Ye, **ghui Lu, Shu Wei, Chunhui Lin, Wanqing Li, Mohamad Fitri Faiz Bin Mahmood, Hao Feng, Zhen Zhao, Yanjie Wang, Yuliang Liu, Hao Liu, Xiang Bai, Can Huang

Abstract: Text-Centric Visual Question Answering (TEC-VQA) in its proper format not only facilitates human-machine interaction in text-centric visual environments but also serves as a de facto gold proxy to evaluate AI models in the domain of text-centric scene understanding. Nonetheless, most existing TEC-VQA benchmarks have focused on high-resource languages like English and Chinese. Despite pioneering wo… ▽ More Text-Centric Visual Question Answering (TEC-VQA) in its proper format not only facilitates human-machine interaction in text-centric visual environments but also serves as a de facto gold proxy to evaluate AI models in the domain of text-centric scene understanding. Nonetheless, most existing TEC-VQA benchmarks have focused on high-resource languages like English and Chinese. Despite pioneering works to expand multilingual QA pairs in non-text-centric VQA datasets through translation engines, the translation-based protocol encounters a substantial "visual-textual misalignment" problem when applied to TEC-VQA. Specifically, it prioritizes the text in question-answer pairs while disregarding the visual text present in images. Moreover, it fails to address complexities related to nuanced meaning, contextual distortion, language bias, and question-type diversity. In this work, we tackle multilingual TEC-VQA by introducing MTVQA, the first benchmark featuring high-quality human expert annotations across 9 diverse languages, consisting of 6,778 question-answer pairs across 2,116 images. Further, by comprehensively evaluating numerous state-of-the-art Multimodal Large Language Models (MLLMs), including GPT-4o, GPT-4V, Claude3, and Gemini, on the MTVQA dataset, it is evident that there is still a large room for performance improvement, underscoring the value of MTVQA. Additionally, we supply multilingual training data within the MTVQA dataset, demonstrating that straightforward fine-tuning with this data can substantially enhance multilingual TEC-VQA performance. We aspire that MTVQA will offer the research community fresh insights and stimulate further exploration in multilingual visual text comprehension. The project homepage is available at https://bytedance.github.io/MTVQA/. △ Less

Submitted 11 June, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

arXiv:2405.11826 [pdf, other]

Data quality control system and long-term performance monitor of the LHAASO-KM2A

Authors: Zhen Cao, F. Aharonian, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, W. Bian, A. V. Bukevich, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, H. X. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. Chen , et al. (263 additional authors not shown)

Abstract: The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To… ▽ More The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To ensure the reliability of the LHAASO-KM2A data, a three-level quality control system has been established. It is used to monitor the status of detector units, stability of reconstructed parameters and the performance of the array based on observations of the Crab Nebula and Moon shadow. This paper will introduce the control system and its application on the LHAASO-KM2A data collected from August 2021 to July 2023. During this period, the pointing and angular resolution of the array were stable. From the observations of the Moon shadow and Crab Nebula, the results achieved using the two methods are consistent with each other. According to the observation of the Crab Nebula at energies from 25 TeV to 100 TeV, the time averaged pointing errors are estimated to be $-0.003^{\circ} \pm 0.005^{\circ}$ and $0.001^{\circ} \pm 0.006^{\circ}$ in the R.A. and Dec directions, respectively. △ Less

Submitted 13 June, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

Comments: 15 pages, 9 figures

arXiv:2405.11485 [pdf]

doi 10.1038/s41467-024-48636-z

Evidence for Multiferroicity in Single-Layer CuCrSe$_2$

Authors: Zhenyu Sun, Yueqi Su, Aomiao Zhi, Zhicheng Gao, Xu Han, Kang Wu, Lihong Bao, Yuan Huang, Youguo Shi, Xuedong Bai, Peng Cheng, Lan Chen, Kehui Wu, Xuezeng Tian, Changzheng Wu, Baojie Feng

Abstract: Multiferroic materials, which simultaneously exhibit ferroelectricity and magnetism, have attracted substantial attention due to their fascinating physical properties and potential technological applications. With the trends towards device miniaturization, there is an increasing demand for the persistence of multiferroicity in single-layer materials at elevated temperatures. Here, we report high-t… ▽ More Multiferroic materials, which simultaneously exhibit ferroelectricity and magnetism, have attracted substantial attention due to their fascinating physical properties and potential technological applications. With the trends towards device miniaturization, there is an increasing demand for the persistence of multiferroicity in single-layer materials at elevated temperatures. Here, we report high-temperature multiferroicity in single-layer CuCrSe$_2$, which hosts room-temperature ferroelectricity and 120 K ferromagnetism. Notably, the ferromagnetic coupling in single-layer CuCrSe$_2$ is enhanced by the ferroelectricity-induced orbital shift of Cr atoms, which is distinct from both types I and II multiferroicity. These findings are supported by a combination of second-harmonic generation, piezo-response force microscopy, scanning transmission electron microscopy, magnetic, and Hall measurements. Our research provides not only an exemplary platform for delving into intrinsic magnetoelectric interactions at the single-layer limit but also sheds light on potential development of electronic and spintronic devices utilizing two-dimensional multiferroics. △ Less

Submitted 19 May, 2024; originally announced May 2024.

Journal ref: Nature Communications 15, 4252 (2024)

arXiv:2405.11437 [pdf, other]

The First Swahili Language Scene Text Detection and Recognition Dataset

Authors: Fadila Wendigoundi Douamba, Jianjun Song, Ling Fu, Yuliang Liu, Xiang Bai

Abstract: Scene text recognition is essential in many applications, including automated translation, information retrieval, driving assistance, and enhancing accessibility for individuals with visual impairments. Much research has been done to improve the accuracy and performance of scene text detection and recognition models. However, most of this research has been conducted in the most common languages, E… ▽ More Scene text recognition is essential in many applications, including automated translation, information retrieval, driving assistance, and enhancing accessibility for individuals with visual impairments. Much research has been done to improve the accuracy and performance of scene text detection and recognition models. However, most of this research has been conducted in the most common languages, English and Chinese. There is a significant gap in low-resource languages, especially the Swahili Language. Swahili is widely spoken in East African countries but is still an under-explored language in scene text recognition. No studies have been focused explicitly on Swahili natural scene text detection and recognition, and no dataset for Swahili language scene text detection and recognition is publicly available. We propose a comprehensive dataset of Swahili scene text images and evaluate the dataset on different scene text detection and recognition models. The dataset contains 976 images collected in different places and under various circumstances. Each image has its annotation at the word level. The proposed dataset can also serve as a benchmark dataset specific to the Swahili language for evaluating and comparing different approaches and fostering future research endeavors. The dataset is available on GitHub via this link: https://github.com/FadilaW/Swahili-STR-Dataset △ Less

Submitted 18 May, 2024; originally announced May 2024.

Comments: Accepted to ICDAR 2024

arXiv:2405.08077 [pdf, other]

Methods and stability tests associated with the sterile neutrino search using improved high-energy $ν_μ$ event reconstruction in IceCube

Authors: IceCube Collaboration, R. Abbasi, M. Ackermann, J. Adams, S. K. Agarwalla, J. A. Aguilar, M. Ahlers, J. M. Alameddine, N. M. Amin, K. Andeen, C. Argüelles, Y. Ashida, S. Athanasiadou, L. Ausborm, S. N. Axani, X. Bai, A. Balagopal V., M. Baricevic, S. W. Barwick, S. Bash, V. Basu, R. Bay, J. J. Beatty, J. Becker Tjus, J. Beise , et al. (398 additional authors not shown)

Abstract: We provide supporting details for the search for a 3+1 sterile neutrino using data collected over eleven years at the IceCube Neutrino Observatory. The analysis uses atmospheric muon-flavored neutrinos from 0.5 to 100\, TeV that traverse the Earth to reach the IceCube detector, and finds a best-fit point at $\sin^2(2θ_{24}) = 0.16$ and $Δm^{2}_{41} = 3.5$ eV$^2$ with a goodness-of-fit p-value of 1… ▽ More We provide supporting details for the search for a 3+1 sterile neutrino using data collected over eleven years at the IceCube Neutrino Observatory. The analysis uses atmospheric muon-flavored neutrinos from 0.5 to 100\, TeV that traverse the Earth to reach the IceCube detector, and finds a best-fit point at $\sin^2(2θ_{24}) = 0.16$ and $Δm^{2}_{41} = 3.5$ eV$^2$ with a goodness-of-fit p-value of 12\% and consistency with the null hypothesis of no oscillations to sterile neutrinos with a p-value of 3.1\%. Several improvements were made over past analyses, which are reviewed in this article, including upgrades to the reconstruction and the study of sources of systematic uncertainty. We provide details of the fit quality and discuss stability tests that split the data for separate samples, comparing results. We find that the fits are consistent between split data sets. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: 18 pages, 17 figures, 2 tables. This long-form paper is a companion to the letter "A search for an eV-scale sterile neutrino using improved high-energy νμ event reconstruction in IceCube."

arXiv:2405.08070 [pdf, other]

A search for an eV-scale sterile neutrino using improved high-energy $ν_μ$ event reconstruction in IceCube

Authors: IceCube Collaboration, R. Abbasi, M. Ackermann, J. Adams, S. K. Agarwalla, J. A. Aguilar, M. Ahlers, J. M. Alameddine, N. M. Amin, K. Andeen, C. Argüelles, Y. Ashida, S. Athanasiadou, L. Ausborm, S. N. Axani, X. Bai, A. Balagopal V., M. Baricevic, S. W. Barwick, S. Bash, V. Basu, R. Bay, J. J. Beatty, J. Becker Tjus, J. Beise , et al. (398 additional authors not shown)

Abstract: This Letter presents the result of a 3+1 sterile neutrino search using 10.7 years of IceCube data. We analyze atmospheric muon neutrinos that traverse the Earth with energies ranging from 0.5 to 100 TeV, incorporating significant improvements in modeling neutrino flux and detector response compared to earlier studies. Notably, for the first time, we categorize data into starting and through-going… ▽ More This Letter presents the result of a 3+1 sterile neutrino search using 10.7 years of IceCube data. We analyze atmospheric muon neutrinos that traverse the Earth with energies ranging from 0.5 to 100 TeV, incorporating significant improvements in modeling neutrino flux and detector response compared to earlier studies. Notably, for the first time, we categorize data into starting and through-going events, distinguishing neutrino interactions with vertices inside or outside the instrumented volume, to improve energy resolution. The best-fit point for a 3+1 model is found to be at $\sin^2(2θ_{24}) = 0.16$ and $Δm^{2}_{41} = 3.5$ eV$^2$, which agrees with previous iterations of this study. The result is consistent with the null hypothesis of no sterile neutrinos with a p-value of 3.1\%. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: 9 pages, 3 figures. This letter is supported by the long-form paper "Methods and stability tests associated with the sterile neutrino search using improved high-energy $ν_μ$ event reconstruction in IceCube," also appearing on arXiv

arXiv:2405.07691 [pdf, other]

Discovery of Very-high-energy Gamma-ray Emissions from the Low Luminosity AGN NGC 4278 by LHAASO

Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (255 additional authors not shown)

Abstract: The first source catalog of Large High Altitude Air Shower Observatory reported the detection of a very-high-energy gamma ray source, 1LHAASO J1219+2915. In this paper a further detailed study of the spectral and temporal behavior of this point-like source have been carried. The best-fit position of the TeV source ($\rm{RA}=185.05^{\circ}\pm0.04^{\circ}$, $\rm{Dec}=29.25^{\circ}\pm0.03^{\circ}$) i… ▽ More The first source catalog of Large High Altitude Air Shower Observatory reported the detection of a very-high-energy gamma ray source, 1LHAASO J1219+2915. In this paper a further detailed study of the spectral and temporal behavior of this point-like source have been carried. The best-fit position of the TeV source ($\rm{RA}=185.05^{\circ}\pm0.04^{\circ}$, $\rm{Dec}=29.25^{\circ}\pm0.03^{\circ}$) is compatible with NGC 4278 within $\sim0.03$ degree. Variation analysis shows an indication of the variability at a few months level in the TeV band, which is consistent with low frequency observations. Based on these observations, we report the detection of TeV $γ$-ray emissions from this low-luminosity AGN NGC 4278. The observations by LHAASO-WCDA during active period has a significance level of 8.8\,$σ$ with best-fit photon spectral index $\varGamma=2.56\pm0.14$ and a flux $f_{1-10\,\rm{TeV}}=(7.0\pm1.1_{\rm{sta}}\pm0.35_{\rm{syst}})\times10^{-13}\,\rm{photons\,cm^{-2}\,s^{-1}}$, or approximately $5\%$ of the Crab Nebula. The discovery of VHE from NGC 4278 indicates that the compact, weak radio jet can efficiently accelerate particles and emit TeV photons. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: 11 pages, 5 figures

arXiv:2405.06706 [pdf, other]

Exploring the Capabilities of Large Multimodal Models on Dense Text

Authors: Shuo Zhang, Biao Yang, Zhang Li, Zhiyin Ma, Yuliang Liu, Xiang Bai

Abstract: While large multi-modal models (LMM) have shown notable progress in multi-modal tasks, their capabilities in tasks involving dense textual content remains to be fully explored. Dense text, which carries important information, is often found in documents, tables, and product descriptions. Understanding dense text enables us to obtain more accurate information, assisting in making better decisions.… ▽ More While large multi-modal models (LMM) have shown notable progress in multi-modal tasks, their capabilities in tasks involving dense textual content remains to be fully explored. Dense text, which carries important information, is often found in documents, tables, and product descriptions. Understanding dense text enables us to obtain more accurate information, assisting in making better decisions. To further explore the capabilities of LMM in complex text tasks, we propose the DT-VQA dataset, with 170k question-answer pairs. In this paper, we conduct a comprehensive evaluation of GPT4V, Gemini, and various open-source LMMs on our dataset, revealing their strengths and weaknesses. Furthermore, we evaluate the effectiveness of two strategies for LMM: prompt engineering and downstream fine-tuning. We find that even with automatically labeled training datasets, significant improvements in model performance can be achieved. We hope that this research will promote the study of LMM in dense text tasks. Code will be released at https://github.com/Yuliang-Liu/MultimodalOCR. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2405.03988 [pdf, other]

Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application

Authors: Jian Jia, Yipei Wang, Yan Li, Honggang Chen, Xuehan Bai, Zhaocheng Liu, Jian Liang, Quan Chen, Han Li, Peng Jiang, Kun Gai

Abstract: Contemporary recommender systems predominantly rely on collaborative filtering techniques, employing ID-embedding to capture latent associations among users and items. However, this approach overlooks the wealth of semantic information embedded within textual descriptions of items, leading to suboptimal performance in cold-start scenarios and long-tail user recommendations. Leveraging the capabili… ▽ More Contemporary recommender systems predominantly rely on collaborative filtering techniques, employing ID-embedding to capture latent associations among users and items. However, this approach overlooks the wealth of semantic information embedded within textual descriptions of items, leading to suboptimal performance in cold-start scenarios and long-tail user recommendations. Leveraging the capabilities of Large Language Models (LLMs) pretrained on massive text corpus presents a promising avenue for enhancing recommender systems by integrating open-world domain knowledge. In this paper, we propose an Llm-driven knowlEdge Adaptive RecommeNdation (LEARN) framework that synergizes open-world knowledge with collaborative knowledge. We address computational complexity concerns by utilizing pretrained LLMs as item encoders and freezing LLM parameters to avoid catastrophic forgetting and preserve open-world knowledge. To bridge the gap between the open-world and collaborative domains, we design a twin-tower structure supervised by the recommendation task and tailored for practical industrial application. Through offline experiments on the large-scale industrial dataset and online experiments on A/B tests, we demonstrate the efficacy of our approach. △ Less

Submitted 7 May, 2024; originally announced May 2024.

Comments: 11 pages, 6 figures

arXiv:2405.03817 [pdf, other]

Search for joint multimessenger signals from potential Galactic PeVatrons with HAWC and IceCube

Authors: R. Alfaro, C. Alvarez, J. C. Arteaga-Velázquez, D. Avila Rojas, H. A. Ayala Solares, R. Babu, E. Belmont-Moreno, K. S. Caballero-Mora, T. Capistrán, A. Carramiñana, S. Casanova, U. Cotti, J. Cotzomi, S. Coutiño de León, E. De la Fuente, D. Depaoli, N. Di Lalla, R. Diaz Hernandez, J. C. Díaz-Vélez, K. Engel, T. Ergin, K. L. Fan, K. Fang, N. Fraija, S. Fraija , et al. (469 additional authors not shown)

Abstract: Galactic PeVatrons are sources that can accelerate cosmic rays to PeV energies. The high-energy cosmic rays are expected to interact with the surrounding ambient material or radiation, resulting in the production of gamma rays and neutrinos. To optimize for the detection of such associated production of gamma rays and neutrinos for a given source morphology and spectrum, a multi-messenger analysis… ▽ More Galactic PeVatrons are sources that can accelerate cosmic rays to PeV energies. The high-energy cosmic rays are expected to interact with the surrounding ambient material or radiation, resulting in the production of gamma rays and neutrinos. To optimize for the detection of such associated production of gamma rays and neutrinos for a given source morphology and spectrum, a multi-messenger analysis that combines gamma rays and neutrinos is required. In this study, we use the Multi-Mission Maximum Likelihood framework (3ML) with IceCube Maximum Likelihood Analysis software (i3mla) and HAWC Accelerated Likelihood (HAL) to search for a correlation between 22 known gamma-ray sources from the third HAWC gamma-ray catalog and 14 years of IceCube track-like data. No significant neutrino emission from the direction of the HAWC sources was found. We report the best-fit gamma-ray model and 90% CL neutrino flux limit from the 22 sources. From the neutrino flux limit, we conclude that the gamma-ray emission from five of the sources can not be produced purely from hadronic interactions. We report the limit for the fraction of gamma rays produced by hadronic interactions for these five sources. △ Less

Submitted 6 May, 2024; originally announced May 2024.

arXiv:2405.01444 [pdf, other]

Data-driven analysis of the beauty hadron production in p+p collisions at the LHC with Bayesian unfolding

Authors: Xiaozhi Bai, Guangsheng Li, Yifei Zhang, Qingyi Situ, Xiaolong Chen

Abstract: Heavy flavour production in proton-proton (pp) collisions provides insights into the fundamental properties of Quantum Chromodynamics (QCD). Beauty hadron production measurements are widely performed through indirect approaches based on their inclusive decay modes. A Bayesian unfolding data-driven analysis of the ALICE and LHCb data was performed in this study, which recovers the full kinematic in… ▽ More Heavy flavour production in proton-proton (pp) collisions provides insights into the fundamental properties of Quantum Chromodynamics (QCD). Beauty hadron production measurements are widely performed through indirect approaches based on their inclusive decay modes. A Bayesian unfolding data-driven analysis of the ALICE and LHCb data was performed in this study, which recovers the full kinematic information of the beauty hadrons via different inclusive decay channels. The corresponding beauty hadron production cross sections obtained after the Bayesian unfolding are found to be consistent within their uncertainties. The weighted average open beauty production cross sections are presented as a function of the transverse momentum and rapidity in pp collisions at $\sqrt{s}$ = 5.02 TeV and $\sqrt{s}$ = 13 TeV, respectively. The $p_T$-integrated open beauty production $\mathrm{d}σ/\mathrm{d}y$ and the total $\mathrm{b}\rm\overline{b}$ cross section $σ_{\rm \mathrm{b}\rm\overline{b}}$ are also reported. The precision of these results significantly improves upon worldwide measurements, providing valuable validation and constraints on mechanisms of heavy flavour production in pp collisions at the LHC energies. △ Less

Submitted 2 May, 2024; originally announced May 2024.

Comments: 15 pages, 6 figures

arXiv:2404.19652 [pdf, other]

VimTS: A Unified Video and Image Text Spotter for Enhancing the Cross-domain Generalization

Authors: Yuliang Liu, Mingxin Huang, Hao Yan, Linger Deng, Weijia Wu, Hao Lu, Chunhua Shen, Lianwen **, Xiang Bai

Abstract: Text spotting, a task involving the extraction of textual information from image or video sequences, faces challenges in cross-domain adaption, such as image-to-image and image-to-video generalization. In this paper, we introduce a new method, termed VimTS, which enhances the generalization ability of the model by achieving better synergy among different tasks. Typically, we propose a Prompt Queri… ▽ More Text spotting, a task involving the extraction of textual information from image or video sequences, faces challenges in cross-domain adaption, such as image-to-image and image-to-video generalization. In this paper, we introduce a new method, termed VimTS, which enhances the generalization ability of the model by achieving better synergy among different tasks. Typically, we propose a Prompt Queries Generation Module and a Tasks-aware Adapter to effectively convert the original single-task model into a multi-task model suitable for both image and video scenarios with minimal additional parameters. The Prompt Queries Generation Module facilitates explicit interaction between different tasks, while the Tasks-aware Adapter helps the model dynamically learn suitable features for each task. Additionally, to further enable the model to learn temporal information at a lower cost, we propose a synthetic video text dataset (VTD-368k) by leveraging the Content Deformation Fields (CoDeF) algorithm. Notably, our method outperforms the state-of-the-art method by an average of 2.6% in six cross-domain benchmarks such as TT-to-IC15, CTW1500-to-TT, and TT-to-CTW1500. For video-level cross-domain adaption, our method even surpasses the previous end-to-end video spotting method in ICDAR2015 video and DSText v2 by an average of 5.5% on the MOTA metric, using only image-level data. We further demonstrate that existing Large Multimodal Models exhibit limitations in generating cross-domain scene text spotting, in contrast to our VimTS model which requires significantly fewer parameters and data. The code and datasets will be made available at the https://VimTextSpotter.github.io. △ Less

Submitted 14 May, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

arXiv:2404.19589 [pdf, other]

Acceptance Tests of more than 10 000 Photomultiplier Tubes for the multi-PMT Digital Optical Modules of the IceCube Upgrade

Authors: R. Abbasi, M. Ackermann, J. Adams, S. K. Agarwalla, J. A. Aguilar, M. Ahlers, J. M. Alameddine, N. M. Amin, K. Andeen, C. Argüelles, Y. Ashida, S. Athanasiadou, L. Ausborm, S. N. Axani, X. Bai, A. Balagopal V., M. Baricevic, S. W. Barwick, S. Bash, V. Basu, R. Bay, J. J. Beatty, J. Becker Tjus, J. Beise, C. Bellenghi , et al. (399 additional authors not shown)

Abstract: More than 10,000 photomultiplier tubes (PMTs) with a diameter of 80 mm will be installed in multi-PMT Digital Optical Modules (mDOMs) of the IceCube Upgrade. These have been tested and pre-calibrated at two sites. A throughput of more than 1000 PMTs per week with both sites was achieved with a modular design of the testing facilities and highly automated testing procedures. The testing facilities… ▽ More More than 10,000 photomultiplier tubes (PMTs) with a diameter of 80 mm will be installed in multi-PMT Digital Optical Modules (mDOMs) of the IceCube Upgrade. These have been tested and pre-calibrated at two sites. A throughput of more than 1000 PMTs per week with both sites was achieved with a modular design of the testing facilities and highly automated testing procedures. The testing facilities can easily be adapted to other PMTs, such that they can, e.g., be re-used for testing the PMTs for IceCube-Gen2. Single photoelectron response, high voltage dependence, time resolution, prepulse, late pulse, afterpulse probabilities, and dark rates were measured for each PMT. We describe the design of the testing facilities, the testing procedures, and the results of the acceptance tests. △ Less

Submitted 20 June, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

Comments: 24 pages, 19 figures, 2 tables, submitted to JINST

arXiv:2404.18366 [pdf, ps, other]

Anomalous Phonon in Charge-Density-Wave Phase of Kagome Metal CsV3Sb5

Authors: Han-Yu Wang, Xiao-Cheng Bai, Wen-Feng Wu, Zhi Zeng, Da-Yong Liu, Liang-Jian Zou

Abstract: CsV3Sb5, a notable compound within the kagome family, is renowned for its topological and superconducting properties, as well as its detection of local magnetic field and anomalous Hall effect in experiments. However, the origin of this local magnetic field is still veiled. In this study, we employ the first-principles calculations to investigate the atomic vibration in both the pristine and the c… ▽ More CsV3Sb5, a notable compound within the kagome family, is renowned for its topological and superconducting properties, as well as its detection of local magnetic field and anomalous Hall effect in experiments. However, the origin of this local magnetic field is still veiled. In this study, we employ the first-principles calculations to investigate the atomic vibration in both the pristine and the charge-density-wave phases of CsV$_3$Sb$_5$. Our analysis reveals the presence of ``anomalous phonons" in these structures, these phonon induce the circular vibration of atoms, contributing to the phonon magnetic moments and subsequently to the observed the local magnetic fields. Additionally, we observe that lattice distortion in the charge-density-wave phase amplifies these circular vibrations, resulting in a stronger local magnetic field, particularly from the vanadium atoms. This investigation not only reveals the potential relation between lattice distortion and atomic polarization but also offers a novel idea to understand the origin of local magnetic moment in CsV3Sb5. △ Less

Submitted 28 April, 2024; originally announced April 2024.

Comments: 6 pages, 5 figures

arXiv:2404.18092 [pdf, ps, other]

Numerous Bidirectionally Propagating Plasma Blobs near the Reconnection Site of a Solar Eruption

Authors: Zhenyong Hou, Hui Tian, Maria S. Madjarska, Hechao Chen, Tanmoy Samanta, Xianyong Bai, Zhentong Li, Yang Su, Wei Chen, Yuanyong Deng

Abstract: Current sheet is a common structure involved in solar eruptions. However, it is observed in minority of the events and the physical properties of its fine structures during a solar eruption are rarely investigated. Here, we report an on-disk observation that displays 108 compact, circular or elliptic bright structures, presumably plasma blobs, propagating bidirectionally along a flare current shee… ▽ More Current sheet is a common structure involved in solar eruptions. However, it is observed in minority of the events and the physical properties of its fine structures during a solar eruption are rarely investigated. Here, we report an on-disk observation that displays 108 compact, circular or elliptic bright structures, presumably plasma blobs, propagating bidirectionally along a flare current sheet during a period of $\sim$24 minutes. From extreme ultraviolet images, we have investigated the temporal variation of the blob number around the flare peak time. The current sheet connects the flare loops and the erupting filament. The width, duration, projected velocity, temperature, and density of these blobs are $\sim$1.7$\pm$0.5\,Mm, $\sim$79$\pm$57\,s, $\sim$191$\pm$81\,\kms, $\sim$10$^{6.4\pm0.1}$ K, and $\sim$10$^{10.1\pm0.3}$ cm$^{-3}$, respectively. The reconnection site rises with a velocity of $\leqslant$69\,\kms. The observational results suggest that plasmoid instability plays an important role in the energy release process of solar eruptions. △ Less

Submitted 28 April, 2024; originally announced April 2024.

Comments: Accepted by A&A, 9 pages, and 5 figures

arXiv:2404.17513 [pdf, other]

A Comprehensive Evaluation on Event Reasoning of Large Language Models

Authors: Zhengwei Tao, Zhi **, Yifan Zhang, Xiancai Chen, Xiaoying Bai, Yue Fang, Haiyan Zhao, Jia Li, Chongyang Tao

Abstract: Event reasoning is a fundamental ability that underlies many applications. It requires event schema knowledge to perform global reasoning and needs to deal with the diversity of the inter-event relations and the reasoning paradigms. How well LLMs accomplish event reasoning on various relations and reasoning paradigms remains unknown. To mitigate this disparity, we comprehensively evaluate the abil… ▽ More Event reasoning is a fundamental ability that underlies many applications. It requires event schema knowledge to perform global reasoning and needs to deal with the diversity of the inter-event relations and the reasoning paradigms. How well LLMs accomplish event reasoning on various relations and reasoning paradigms remains unknown. To mitigate this disparity, we comprehensively evaluate the abilities of event reasoning of LLMs. We introduce a novel benchmark EV2 for EValuation of EVent reasoning. EV2 consists of two levels of evaluation of schema and instance and is comprehensive in relations and reasoning paradigms. We conduct extensive experiments on EV2. We find that LLMs have abilities to accomplish event reasoning but their performances are far from satisfactory. We also notice the imbalance of event reasoning abilities in LLMs. Besides, LLMs have event schema knowledge, however, they're not aligned with humans on how to utilize the knowledge. Based on these findings, we introduce two methods to guide the LLMs to utilize the event schema knowledge. Both methods achieve improvements. △ Less

Submitted 26 April, 2024; originally announced April 2024.

arXiv:2404.17210 [pdf, other]

Studies on Topological High-fold Degenerate Semimetal with Chiral Structure

Authors: Yan Wang, Xiaosong Bai, Wujun Shi, Wenjian Liu, Qiunan Xu

Abstract: In recent years, a type of topological semimetals (TSMs) that can host new fermions with high-fold degeneracy has attracted considerable interest. Among them, ones with chiral structrue particularly catch our attention. Such chiral high-fold degenerate semimetals always have a larger topological charge and longer Fermi arcs which bringing about some special properties. In this work, we found 147 c… ▽ More In recent years, a type of topological semimetals (TSMs) that can host new fermions with high-fold degeneracy has attracted considerable interest. Among them, ones with chiral structrue particularly catch our attention. Such chiral high-fold degenerate semimetals always have a larger topological charge and longer Fermi arcs which bringing about some special properties. In this work, we found 147 chiral materials with exotic fermions near Fermi level by high-throughput calculation and screening. We selected some typical examples to analyse its topological properties such as topological surface states (TSSs) and Berry curvature. Our results are helpful to provide a promising platform for exploring the physical properties of chiral fermions and application of chiral TSMs. △ Less

Submitted 26 April, 2024; originally announced April 2024.

arXiv:2404.16176 [pdf, other]

Unweighted Layered Graph Traversal

Authors: Xingjian Bai, Christian Coester, Romain Cosson

Abstract: Introduced by Papadimitriou and Yannakakis in 1989, layered graph traversal is an important problem in online algorithms and mobile computing that has been studied for several decades, and which now is essentially resolved in its original formulation. In this paper, we demonstrate that what appears to be an innocuous modification of the problem actually leads to a drastic (exponential) reduction o… ▽ More Introduced by Papadimitriou and Yannakakis in 1989, layered graph traversal is an important problem in online algorithms and mobile computing that has been studied for several decades, and which now is essentially resolved in its original formulation. In this paper, we demonstrate that what appears to be an innocuous modification of the problem actually leads to a drastic (exponential) reduction of the competitive ratio. Specifically, we present an algorithm that is $O(\log^2 w)$-competitive for traversing unweighted layered graphs of width $w$. Our technique is based on a simple entropic regularizer, which evolves as the agent progresses in the layered graph. Our algorithm is randomized and simply maintains that at all layers, the probability distribution of the position of the mobile agent maximizes the entropic regularizer. △ Less

Submitted 24 April, 2024; originally announced April 2024.

arXiv:2404.15584 [pdf]

Research on OPF control of three-phase four-wire low-voltage distribution network considering uncertainty

Authors: Rui Wang, Xiaoqing Bai, Shengquan Huang, Shoupu Wei

Abstract: As power systems become more complex and uncertain, low-voltage distribution networks face numerous challenges, including three-phase imbalances caused by asymmetrical loads and distributed energy resources. We propose a robust stochastic optimization (RSO) based optimal power flow (OPF) control method for three-phase, four-wire low-voltage distribution networks that consider uncertainty to addres… ▽ More As power systems become more complex and uncertain, low-voltage distribution networks face numerous challenges, including three-phase imbalances caused by asymmetrical loads and distributed energy resources. We propose a robust stochastic optimization (RSO) based optimal power flow (OPF) control method for three-phase, four-wire low-voltage distribution networks that consider uncertainty to address these issues. Using historical data and deep learning classification methods, the proposed method simulates optimal system behaviour without requiring communication infrastructure. The simulation results verify that the proposed method effectively controls the voltage and current amplitude while minimizing the operational cost and three-phase imbalance within acceptable limits. The proposed method shows promise for managing uncertainties and optimizing performance in low-voltage distribution networks. △ Less

Submitted 23 April, 2024; originally announced April 2024.

Comments: systems optimization, robust optimization, local control

arXiv:2404.15264 [pdf, other]

TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting

Authors: Jiahe Li, Jiawei Zhang, Xiao Bai, ** Zheng, Xin Ning, Jun Zhou, Lin Gu

Abstract: Radiance fields have demonstrated impressive performance in synthesizing lifelike 3D talking heads. However, due to the difficulty in fitting steep appearance changes, the prevailing paradigm that presents facial motions by directly modifying point appearance may lead to distortions in dynamic regions. To tackle this challenge, we introduce TalkingGaussian, a deformation-based radiance fields fram… ▽ More Radiance fields have demonstrated impressive performance in synthesizing lifelike 3D talking heads. However, due to the difficulty in fitting steep appearance changes, the prevailing paradigm that presents facial motions by directly modifying point appearance may lead to distortions in dynamic regions. To tackle this challenge, we introduce TalkingGaussian, a deformation-based radiance fields framework for high-fidelity talking head synthesis. Leveraging the point-based Gaussian Splatting, facial motions can be represented in our method by applying smooth and continuous deformations to persistent Gaussian primitives, without requiring to learn the difficult appearance change like previous methods. Due to this simplification, precise facial motions can be synthesized while kee** a highly intact facial feature. Under such a deformation paradigm, we further identify a face-mouth motion inconsistency that would affect the learning of detailed speaking motions. To address this conflict, we decompose the model into two branches separately for the face and inside mouth areas, therefore simplifying the learning tasks to help reconstruct more accurate motion and structure of the mouth region. Extensive experiments demonstrate that our method renders high-quality lip-synchronized talking head videos, with better facial fidelity and higher efficiency compared with previous methods. △ Less

Submitted 23 April, 2024; originally announced April 2024.

Comments: Project page: https://fictionarry.github.io/TalkingGaussian/

Showing 1–50 of 1,233 results for author: Bai, X