Search | arXiv e-print repository

Improving Multi-modal Recommender Systems by Denoising and Aligning Multi-modal Content and User Feedback

Authors: Guipeng Xv, Xinyu Li, Ruobing Xie, Chen Lin, Chong Liu, Feng Xia, Zhanhui Kang, Leyu Lin

Abstract: Multi-modal recommender systems (MRSs) are pivotal in diverse online web platforms and have garnered considerable attention in recent years. However, previous studies overlook the challenges of (1) noisy multi-modal content, (2) noisy user feedback, and (3) aligning multi-modal content with user feedback. In order to tackle these challenges, we propose Denoising and Aligning Multi-modal Recommende… ▽ More Multi-modal recommender systems (MRSs) are pivotal in diverse online web platforms and have garnered considerable attention in recent years. However, previous studies overlook the challenges of (1) noisy multi-modal content, (2) noisy user feedback, and (3) aligning multi-modal content with user feedback. In order to tackle these challenges, we propose Denoising and Aligning Multi-modal Recommender System (DA-MRS). To mitigate multi-modal noise, DA-MRS first constructs item-item graphs determined by consistent content similarity across modalities. To denoise user feedback, DA-MRS associates the probability of observed feedback with multi-modal content and devises a denoised BPR loss. Furthermore, DA-MRS implements Alignment guided by User preference to enhance task-specific item representation and Alignment guided by graded Item relations to provide finer-grained alignment. Extensive experiments verify that DA-MRS is a plug-and-play framework and achieves significant and consistent improvements across various datasets, backbone models, and noisy scenarios. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.01684 [pdf, other]

Color Glass Condensate meets High Twist Expansion

Authors: Yu Fu, Zhong-Bo Kang, Farid Salazar, Xin-Nian Wang, Hongxi Xing

Abstract: We establish the correspondence between two well-known frameworks for QCD multiple scattering in nuclear media: the Color Glass Condensate (CGC) and the High-Twist (HT) expansion formalism. We argue that a consistent matching between both frameworks, in their common domain of validity, is achieved by incorporating the sub-eikonal longitudinal momentum phase in the CGC formalism, which mediates the… ▽ More We establish the correspondence between two well-known frameworks for QCD multiple scattering in nuclear media: the Color Glass Condensate (CGC) and the High-Twist (HT) expansion formalism. We argue that a consistent matching between both frameworks, in their common domain of validity, is achieved by incorporating the sub-eikonal longitudinal momentum phase in the CGC formalism, which mediates the transition between coherent and incoherent scattering. We perform a detailed calculation and analysis of direct photon production in proton-nucleus scattering as a concrete example to establish the matching between HT and CGC up to twist-4, including initial- and final-state interactions, as well as their interferences. The techniques developed in this work can be adapted to other processes in electron-nucleus and proton-nucleus collisions, and they provide a potential avenue for a unified picture of dilute-dense dynamics in nuclear media. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: 34 pages, 12 figures, 1 table

Report number: INT-PUB-24-024

arXiv:2405.15280 [pdf, other]

DFGNN: Dual-frequency Graph Neural Network for Sign-aware Feedback

Authors: Yiqing Wu, Ruobing Xie, Zhao Zhang, Xu Zhang, Fuzhen Zhuang, Leyu Lin, Zhanhui Kang, Yongjun Xu

Abstract: The graph-based recommendation has achieved great success in recent years. However, most existing graph-based recommendations focus on capturing user preference based on positive edges/feedback, while ignoring negative edges/feedback (e.g., dislike, low rating) that widely exist in real-world recommender systems. How to utilize negative feedback in graph-based recommendations still remains underex… ▽ More The graph-based recommendation has achieved great success in recent years. However, most existing graph-based recommendations focus on capturing user preference based on positive edges/feedback, while ignoring negative edges/feedback (e.g., dislike, low rating) that widely exist in real-world recommender systems. How to utilize negative feedback in graph-based recommendations still remains underexplored. In this study, we first conducted a comprehensive experimental analysis and found that (1) existing graph neural networks are not well-suited for modeling negative feedback, which acts as a high-frequency signal in a user-item graph. (2) The graph-based recommendation suffers from the representation degeneration problem. Based on the two observations, we propose a novel model that models positive and negative feedback from a frequency filter perspective called Dual-frequency Graph Neural Network for Sign-aware Recommendation (DFGNN). Specifically, in DFGNN, the designed dual-frequency graph filter (DGF) captures both low-frequency and high-frequency signals that contain positive and negative feedback. Furthermore, the proposed signed graph regularization is applied to maintain the user/item embedding uniform in the embedding space to alleviate the representation degeneration problem. Additionally, we conduct extensive experiments on real-world datasets and demonstrate the effectiveness of the proposed model. Codes of our model will be released upon acceptance. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: Accepted by KDD 2024 Research Track

arXiv:2405.14309 [pdf, other]

Gamma-ray Signal from $Z_{N\geq 3}$ Dark Matter-Companion Models

Authors: Jun Guo, Zhaofeng Kang, Ji-Gang Zhao

Abstract: In Ref.~\cite{Guo:2021rre}, we proposed to replace the final dark matter (DM) particle in the semi-annihilation mode $\rm DM+DM\to antiDM+Higgs~boson$ with its $Z_{N\geq 3}$ companion, thus reducing DM number density without DM-nucleon scattering. In this work, we study the indirect detection signals from DM annihilation, the Higgs boson pair with one of them from the companion decay being on- or… ▽ More In Ref.~\cite{Guo:2021rre}, we proposed to replace the final dark matter (DM) particle in the semi-annihilation mode $\rm DM+DM\to antiDM+Higgs~boson$ with its $Z_{N\geq 3}$ companion, thus reducing DM number density without DM-nucleon scattering. In this work, we study the indirect detection signals from DM annihilation, the Higgs boson pair with one of them from the companion decay being on- or off- shell, depending on the DM-companion mass splitting. We generate the photon spectrum by using PYTHIA8 and study the properties of the spectrum, to find that the hard part of the spectrum in our model is mainly shaped by the direct Higgs boson and thus does not differ much from that of the conventional semi-annihilation mode. Using the Fermi-LAT data of white dwarfs, we derive the current limit of the DM annihilation cross section for ${\rm DM+DM\to companion^*+Higgs~ boson}$, and for the relatively light DM, it reaches the typical thermal cross section. However, for the TeV scale DM, we have to rely on the Cherenkov Telescope Array, which is able to rule out the whole parameter space except for the coannihilation region. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: 14 pages, 3 figures

arXiv:2405.05694 [pdf, other]

Matter Asymmetries in the $Z_N$ Dark matter -companion Models

Authors: Shao-Long Chen, Zhaofeng Kang, Ze-Kun Liu, Peng Zhang

Abstract: A class of $Z_{N\geq 3}$-symmetric WIMP dark matter models that are characterized by the semi-annihilation into the companion of dark matter has been proposed in ref.~\cite{Guo:2021rre}, providing a mechanism to evade the stringent direct detection constraint. In this work, we point out that such models naturally provide the three Sakharov elements necessary for dark matter asymmetry, and moreover… ▽ More A class of $Z_{N\geq 3}$-symmetric WIMP dark matter models that are characterized by the semi-annihilation into the companion of dark matter has been proposed in ref.~\cite{Guo:2021rre}, providing a mechanism to evade the stringent direct detection constraint. In this work, we point out that such models naturally provide the three Sakharov elements necessary for dark matter asymmetry, and moreover this asymmetry can be transferred to the visible sector with a proper link to the leptonic or quark sector. In our minimal $Z_3$ example, the migration to the leptonic sector is via the asymmetric companion decay into neutrinos, and the lepton asymmetry can be further transferred to the quark sector. The CP violation parameter in the model is suppressed in the limit of static annihilation of dark matter, and the lift from thermal motion has been studied for the first time. A preliminary numerical analysis based on the Boltzmann equations shows that both correct relic density of dark matter and baryon asymmetry can be accommodated. △ Less

Submitted 6 June, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

Comments: 25 pages,14 figures

arXiv:2405.03562 [pdf, other]

ID-centric Pre-training for Recommendation

Authors: Yiqing Wu, Ruobing Xie, Zhao Zhang, Fuzhen Zhuang, Xu Zhang, Leyu Lin, Zhanhui Kang, Yongjun Xu

Abstract: Classical sequential recommendation models generally adopt ID embeddings to store knowledge learned from user historical behaviors and represent items. However, these unique IDs are challenging to be transferred to new domains. With the thriving of pre-trained language model (PLM), some pioneer works adopt PLM for pre-trained recommendation, where modality information (e.g., text) is considered un… ▽ More Classical sequential recommendation models generally adopt ID embeddings to store knowledge learned from user historical behaviors and represent items. However, these unique IDs are challenging to be transferred to new domains. With the thriving of pre-trained language model (PLM), some pioneer works adopt PLM for pre-trained recommendation, where modality information (e.g., text) is considered universal across domains via PLM. Unfortunately, the behavioral information in ID embeddings is still verified to be dominating in PLM-based recommendation models compared to modality information and thus limits these models' performance. In this work, we propose a novel ID-centric recommendation pre-training paradigm (IDP), which directly transfers informative ID embeddings learned in pre-training domains to item representations in new domains. Specifically, in pre-training stage, besides the ID-based sequential model for recommendation, we also build a Cross-domain ID-matcher (CDIM) learned by both behavioral and modality information. In the tuning stage, modality information of new domain items is regarded as a cross-domain bridge built by CDIM. We first leverage the textual information of downstream domain items to retrieve behaviorally and semantically similar items from pre-training domains using CDIM. Next, these retrieved pre-trained ID embeddings, rather than certain textual embeddings, are directly adopted to generate downstream new items' embeddings. Through extensive experiments on real-world datasets, both in cold and warm settings, we demonstrate that our proposed model significantly outperforms all baselines. Codes will be released upon acceptance. △ Less

Submitted 7 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

arXiv:2404.19502 [pdf, other]

Interplay between Vector-like Lepton and Seesaw Mechanism:Oblique Corrections

Authors: Shuyang Han, Zhaofeng Kang, Jiang Zhu

Abstract: The non-vanishing neutrino mass strongly hints the existence of right-handed neutrinos (RHNs), singlets of the standard model (SM). However, they are highly decoupled from the SM and difficult to probe. In this work, we consider the Majorana RHNs from the type-I seesaw mechanism may well mix with the heavy neutral lepton dwelling in certain vector-like lepton (VLL), thus acquiring a sizable electr… ▽ More The non-vanishing neutrino mass strongly hints the existence of right-handed neutrinos (RHNs), singlets of the standard model (SM). However, they are highly decoupled from the SM and difficult to probe. In this work, we consider the Majorana RHNs from the type-I seesaw mechanism may well mix with the heavy neutral lepton dwelling in certain vector-like lepton (VLL), thus acquiring a sizable electroweak charge. Such a simple scenario yields many interesting consequences, and the imprint on oblique corrections, well expected from the mass splitting between components of VLL by virtue of VLL-RHN mixing, is our focus here. We analytically calculate the Peskin-Takeuchi parameters S, T and U with full details, carefully treating the Majorana loop to obtain the self consistent expressions free of divergence. Then, we constrain on the VLL-RHN system which only gives a sizable $T$ parameter using the PDG-2021 data and CDF-II data, separately, by imposing $T\lesssim{\cal O}(0.1)$. It is found that for the RHN and VLL below the TeV scale, with a properly large mixing, stands in the frontier of the electroweak precision test such as W-boson mass. △ Less

Submitted 30 April, 2024; originally announced April 2024.

arXiv:2404.16697 [pdf, other]

High-Coherence Kerr-cat qubit in 2D architecture

Authors: Ahmed Hajr, Bingcheng Qing, Ke Wang, Gerwin Koolstra, Zahra Pedramrazi, Ziqi Kang, Larry Chen, Long B. Nguyen, Christian Junger, Noah Goss, Irwin Huang, Bibek Bhandari, Nicholas E. Frattini, Shruti Puri, Justin Dressel, Andrew N. Jordan, David Santiago, Irfan Siddiqi

Abstract: The Kerr-cat qubit is a bosonic qubit in which multi-photon Schrodinger cat states are stabilized by applying a two-photon drive to an oscillator with a Kerr nonlinearity. The suppressed bit-flip rate with increasing cat size makes this qubit a promising candidate to implement quantum error correction codes tailored for noise-biased qubits. However, achieving strong light-matter interactions neces… ▽ More The Kerr-cat qubit is a bosonic qubit in which multi-photon Schrodinger cat states are stabilized by applying a two-photon drive to an oscillator with a Kerr nonlinearity. The suppressed bit-flip rate with increasing cat size makes this qubit a promising candidate to implement quantum error correction codes tailored for noise-biased qubits. However, achieving strong light-matter interactions necessary for stabilizing and controlling this qubit has traditionally required strong microwave drives that heat the qubit and degrade its performance. In contrast, increasing the coupling to the drive port removes the need for strong drives at the expense of large Purcell decay. By integrating an effective band-block filter on-chip, we overcome this trade-off and realize a Kerr-cat qubit in a scalable 2D superconducting circuit with high coherence. This filter provides 30 dB of isolation at the qubit frequency with negligible attenuation at the frequencies required for stabilization and readout. We experimentally demonstrate quantum non-demolition readout fidelity of 99.6% for a cat with 8 photons. Also, to have high-fidelity universal control over this qubit, we combine fast Rabi oscillations with a new demonstration of the X(90) gate through phase modulation of the stabilization drive. Finally, the lifetime in this architecture is examined as a function of the cat size of up to 10 photons in the oscillator achieving a bit-flip time higher than 1 ms and only a linear decrease in the phase-flip time, in good agreement with the theoretical analysis of the circuit. Our qubit shows promise as a building block for fault-tolerant quantum processors with a small footprint. △ Less

Submitted 19 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

arXiv:2404.15704 [pdf, other]

Efficient Multi-Model Fusion with Adversarial Complementary Representation Learning

Authors: Zuheng Kang, Yayun He, Jianzong Wang, Junqing Peng, **g Xiao

Abstract: Single-model systems often suffer from deficiencies in tasks such as speaker verification (SV) and image classification, relying heavily on partial prior knowledge during decision-making, resulting in suboptimal performance. Although multi-model fusion (MMF) can mitigate some of these issues, redundancy in learned representations may limits improvements. To this end, we propose an adversarial comp… ▽ More Single-model systems often suffer from deficiencies in tasks such as speaker verification (SV) and image classification, relying heavily on partial prior knowledge during decision-making, resulting in suboptimal performance. Although multi-model fusion (MMF) can mitigate some of these issues, redundancy in learned representations may limits improvements. To this end, we propose an adversarial complementary representation learning (ACoRL) framework that enables newly trained models to avoid previously acquired knowledge, allowing each individual component model to learn maximally distinct, complementary representations. We make three detailed explanations of why this works and experimental results demonstrate that our method more efficiently improves performance compared to traditional MMF. Furthermore, attribution analysis validates the model trained under ACoRL acquires more complementary knowledge, highlighting the efficacy of our approach in enhancing efficiency and robustness across tasks. △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)

arXiv:2404.14721 [pdf, other]

Dynamically Anchored Prompting for Task-Imbalanced Continual Learning

Authors: Chenxing Hong, Yan **, Zhiqi Kang, Yizhou Chen, Mengke Li, Yang Lu, Hanzi Wang

Abstract: Existing continual learning literature relies heavily on a strong assumption that tasks arrive with a balanced data stream, which is often unrealistic in real-world applications. In this work, we explore task-imbalanced continual learning (TICL) scenarios where the distribution of task data is non-uniform across the whole learning process. We find that imbalanced tasks significantly challenge the… ▽ More Existing continual learning literature relies heavily on a strong assumption that tasks arrive with a balanced data stream, which is often unrealistic in real-world applications. In this work, we explore task-imbalanced continual learning (TICL) scenarios where the distribution of task data is non-uniform across the whole learning process. We find that imbalanced tasks significantly challenge the capability of models to control the trade-off between stability and plasticity from the perspective of recent prompt-based continual learning methods. On top of the above finding, we propose Dynamically Anchored Prompting (DAP), a prompt-based method that only maintains a single general prompt to adapt to the shifts within a task stream dynamically. This general prompt is regularized in the prompt space with two specifically designed prompt anchors, called boosting anchor and stabilizing anchor, to balance stability and plasticity in TICL. Remarkably, DAP achieves this balance by only storing a prompt across the data stream, therefore offering a substantial advantage in rehearsal-free CL. Extensive experiments demonstrate that the proposed DAP results in 4.5% to 15% absolute improvements over state-of-the-art methods on benchmarks under task-imbalanced settings. Our code is available at https://github.com/chenxing6666/DAP △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: Accepted by IJCAI 2024

arXiv:2404.13892 [pdf, other]

doi 10.1145/3652583.3658086

Retrieval-Augmented Audio Deepfake Detection

Authors: Zuheng Kang, Yayun He, Botao Zhao, Xiaoyang Qu, Junqing Peng, **g Xiao, Jianzong Wang

Abstract: With recent advances in speech synthesis including text-to-speech (TTS) and voice conversion (VC) systems enabling the generation of ultra-realistic audio deepfakes, there is growing concern about their potential misuse. However, most deepfake (DF) detection methods rely solely on the fuzzy knowledge learned by a single model, resulting in performance bottlenecks and transparency issues. Inspired… ▽ More With recent advances in speech synthesis including text-to-speech (TTS) and voice conversion (VC) systems enabling the generation of ultra-realistic audio deepfakes, there is growing concern about their potential misuse. However, most deepfake (DF) detection methods rely solely on the fuzzy knowledge learned by a single model, resulting in performance bottlenecks and transparency issues. Inspired by retrieval-augmented generation (RAG), we propose a retrieval-augmented detection (RAD) framework that augments test samples with similar retrieved samples for enhanced detection. We also extend the multi-fusion attentive classifier to integrate it with our proposed RAD framework. Extensive experiments show the superior performance of the proposed RAD framework over baseline methods, achieving state-of-the-art results on the ASVspoof 2021 DF set and competitive results on the 2019 and 2021 LA sets. Further sample analysis indicates that the retriever consistently retrieves samples mostly from the same speaker with acoustic characteristics highly consistent with the query audio, thereby improving detection performance. △ Less

Submitted 23 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

Comments: Accepted by the 2024 International Conference on Multimedia Retrieval (ICMR 2024)

arXiv:2404.11375 [pdf, other]

Text-controlled Motion Mamba: Text-Instructed Temporal Grounding of Human Motion

Authors: Xinghan Wang, Zixi Kang, Yadong Mu

Abstract: Human motion understanding is a fundamental task with diverse practical applications, facilitated by the availability of large-scale motion capture datasets. Recent studies focus on text-motion tasks, such as text-based motion generation, editing and question answering. In this study, we introduce the novel task of text-based human motion grounding (THMG), aimed at precisely localizing temporal se… ▽ More Human motion understanding is a fundamental task with diverse practical applications, facilitated by the availability of large-scale motion capture datasets. Recent studies focus on text-motion tasks, such as text-based motion generation, editing and question answering. In this study, we introduce the novel task of text-based human motion grounding (THMG), aimed at precisely localizing temporal segments corresponding to given textual descriptions within untrimmed motion sequences. Capturing global temporal information is crucial for the THMG task. However, transformer-based models that rely on global temporal self-attention face challenges when handling long untrimmed sequences due to the quadratic computational cost. We address these challenges by proposing Text-controlled Motion Mamba (TM-Mamba), a unified model that integrates temporal global context, language query control, and spatial graph topology with only linear memory cost. The core of the model is a text-controlled selection mechanism which dynamically incorporates global temporal information based on text query. The model is further enhanced to be topology-aware through the integration of relational embeddings. For evaluation, we introduce BABEL-Grounding, the first text-motion dataset that provides detailed textual descriptions of human actions along with their corresponding temporal segments. Extensive evaluations demonstrate the effectiveness of TM-Mamba on BABEL-Grounding. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.08796 [pdf, other]

The Elephant in the Room: Rethinking the Usage of Pre-trained Language Model in Sequential Recommendation

Authors: Zekai Qu, Ruobing Xie, Chaojun Xiao, Xingwu Sun, Zhanhui Kang

Abstract: Sequential recommendation (SR) has seen significant advancements with the help of Pre-trained Language Models (PLMs). Some PLM-based SR models directly use PLM to encode user historical behavior's text sequences to learn user representations, while there is seldom an in-depth exploration of the capability and suitability of PLM in behavior sequence modeling. In this work, we first conduct extensiv… ▽ More Sequential recommendation (SR) has seen significant advancements with the help of Pre-trained Language Models (PLMs). Some PLM-based SR models directly use PLM to encode user historical behavior's text sequences to learn user representations, while there is seldom an in-depth exploration of the capability and suitability of PLM in behavior sequence modeling. In this work, we first conduct extensive model analyses between PLMs and PLM-based SR models, discovering great underutilization and parameter redundancy of PLMs in behavior sequence modeling. Inspired by this, we explore different lightweight usages of PLMs in SR, aiming to maximally stimulate the ability of PLMs for SR while satisfying the efficiency and usability demands of practical systems. We discover that adopting behavior-tuned PLMs for item initializations of conventional ID-based SR models is the most economical framework of PLM-based SR, which would not bring in any additional inference cost but could achieve a dramatic performance boost compared with the original version. Extensive experiments on five datasets show that our simple and universal framework leads to significant improvement compared to classical SR and SOTA PLM-based SR models without additional inference costs. △ Less

Submitted 17 April, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

Comments: 10 pages

arXiv:2404.08793 [pdf, other]

JailbreakLens: Visual Analysis of Jailbreak Attacks Against Large Language Models

Authors: Yingchaojie Feng, Zhizhang Chen, Zhining Kang, Sijia Wang, Minfeng Zhu, Wei Zhang, Wei Chen

Abstract: The proliferation of large language models (LLMs) has underscored concerns regarding their security vulnerabilities, notably against jailbreak attacks, where adversaries design jailbreak prompts to circumvent safety mechanisms for potential misuse. Addressing these concerns necessitates a comprehensive analysis of jailbreak prompts to evaluate LLMs' defensive capabilities and identify potential we… ▽ More The proliferation of large language models (LLMs) has underscored concerns regarding their security vulnerabilities, notably against jailbreak attacks, where adversaries design jailbreak prompts to circumvent safety mechanisms for potential misuse. Addressing these concerns necessitates a comprehensive analysis of jailbreak prompts to evaluate LLMs' defensive capabilities and identify potential weaknesses. However, the complexity of evaluating jailbreak performance and understanding prompt characteristics makes this analysis laborious. We collaborate with domain experts to characterize problems and propose an LLM-assisted framework to streamline the analysis process. It provides automatic jailbreak assessment to facilitate performance evaluation and support analysis of components and keywords in prompts. Based on the framework, we design JailbreakLens, a visual analysis system that enables users to explore the jailbreak performance against the target model, conduct multi-level analysis of prompt characteristics, and refine prompt instances to verify findings. Through a case study, technical evaluations, and expert interviews, we demonstrate our system's effectiveness in hel** users evaluate model security and identify model weaknesses. △ Less

Submitted 12 April, 2024; originally announced April 2024.

Comments: Submitted to VIS 2024

arXiv:2403.11116 [pdf, other]

PhD: A Prompted Visual Hallucination Evaluation Dataset

Authors: Jiazhen Liu, Yuhan Fu, Ruobing Xie, Runquan Xie, Xingwu Sun, Fengzong Lian, Zhanhui Kang, Xirong Li

Abstract: The rapid growth of Large Language Models (LLMs) has driven the development of Large Vision-Language Models (LVLMs). The challenge of hallucination, prevalent in LLMs, also emerges in LVLMs. However, most existing efforts mainly focus on object hallucination in LVLM, ignoring diverse types of LVLM hallucinations. In this study, we delve into the Intrinsic Vision-Language Hallucination (IVL-Hallu)… ▽ More The rapid growth of Large Language Models (LLMs) has driven the development of Large Vision-Language Models (LVLMs). The challenge of hallucination, prevalent in LLMs, also emerges in LVLMs. However, most existing efforts mainly focus on object hallucination in LVLM, ignoring diverse types of LVLM hallucinations. In this study, we delve into the Intrinsic Vision-Language Hallucination (IVL-Hallu) issue, thoroughly analyzing different types of IVL-Hallu on their causes and reflections. Specifically, we propose several novel IVL-Hallu tasks and categorize them into four types: (a) object hallucination, which arises from the misidentification of objects, (b) attribute hallucination, which is caused by the misidentification of attributes, (c) multi-modal conflicting hallucination, which derives from the contradictions between textual and visual information, and (d) counter-common-sense hallucination, which owes to the contradictions between the LVLM knowledge and actual images. Based on these taxonomies, we propose a more challenging benchmark named PhD to evaluate and explore IVL-Hallu. An automated pipeline is proposed for generating different types of IVL-Hallu data. Extensive experiments on five SOTA LVLMs reveal their inability to effectively tackle our proposed IVL-Hallu tasks, with detailed analyses and insights on the origins and possible solutions of these new challenging IVL-Hallu tasks, facilitating future researches on IVL-Hallu and LVLM. The benchmark can be accessed at https://github.com/jiazhen-code/IntrinsicHallu △ Less

Submitted 17 March, 2024; originally announced March 2024.

arXiv:2403.03676 [pdf, other]

Simplified PCNet with Robustness

Authors: Bingheng Li, Xuanting Xie, Haoxiang Lei, Ruiyi Fang, Zhao Kang

Abstract: Graph Neural Networks (GNNs) have garnered significant attention for their success in learning the representation of homophilic or heterophilic graphs. However, they cannot generalize well to real-world graphs with different levels of homophily. In response, the Possion-Charlier Network (PCNet) \cite{li2024pc}, the previous work, allows graph representation to be learned from heterophily to homoph… ▽ More Graph Neural Networks (GNNs) have garnered significant attention for their success in learning the representation of homophilic or heterophilic graphs. However, they cannot generalize well to real-world graphs with different levels of homophily. In response, the Possion-Charlier Network (PCNet) \cite{li2024pc}, the previous work, allows graph representation to be learned from heterophily to homophily. Although PCNet alleviates the heterophily issue, there remain some challenges in further improving the efficacy and efficiency. In this paper, we simplify PCNet and enhance its robustness. We first extend the filter order to continuous values and reduce its parameters. Two variants with adaptive neighborhood sizes are implemented. Theoretical analysis shows our model's robustness to graph structure perturbations or adversarial attacks. We validate our approach through semi-supervised learning tasks on various datasets representing both homophilic and heterophilic graphs. △ Less

Submitted 6 March, 2024; originally announced March 2024.

Comments: 10 pages, 3 figures

arXiv:2403.03670 [pdf, other]

CDC: A Simple Framework for Complex Data Clustering

Authors: Zhao Kang, Xuanting Xie, Bingheng Li, Erlin Pan

Abstract: In today's data-driven digital era, the amount as well as complexity, such as multi-view, non-Euclidean, and multi-relational, of the collected data are growing exponentially or even faster. Clustering, which unsupervisely extracts valid knowledge from data, is extremely useful in practice. However, existing methods are independently developed to handle one particular challenge at the expense of t… ▽ More In today's data-driven digital era, the amount as well as complexity, such as multi-view, non-Euclidean, and multi-relational, of the collected data are growing exponentially or even faster. Clustering, which unsupervisely extracts valid knowledge from data, is extremely useful in practice. However, existing methods are independently developed to handle one particular challenge at the expense of the others. In this work, we propose a simple but effective framework for complex data clustering (CDC) that can efficiently process different types of data with linear complexity. We first utilize graph filtering to fuse geometry structure and attribute information. We then reduce the complexity with high-quality anchors that are adaptively learned via a novel similarity-preserving regularizer. We illustrate the cluster-ability of our proposed method theoretically and experimentally. In particular, we deploy CDC to graph data of size 111M. △ Less

Submitted 6 March, 2024; originally announced March 2024.

Comments: 10 pages, 5 figures

arXiv:2403.03666 [pdf, other]

Provable Filter for Real-world Graph Clustering

Authors: Xuanting Xie, Erlin Pan, Zhao Kang, Wenyu Chen, Bingheng Li

Abstract: Graph clustering, an important unsupervised problem, has been shown to be more resistant to advances in Graph Neural Networks (GNNs). In addition, almost all clustering methods focus on homophilic graphs and ignore heterophily. This significantly limits their applicability in practice, since real-world graphs exhibit a structural disparity and cannot simply be classified as homophily and heterophi… ▽ More Graph clustering, an important unsupervised problem, has been shown to be more resistant to advances in Graph Neural Networks (GNNs). In addition, almost all clustering methods focus on homophilic graphs and ignore heterophily. This significantly limits their applicability in practice, since real-world graphs exhibit a structural disparity and cannot simply be classified as homophily and heterophily. Thus, a principled way to handle practical graphs is urgently needed. To fill this gap, we provide a novel solution with theoretical support. Interestingly, we find that most homophilic and heterophilic edges can be correctly identified on the basis of neighbor information. Motivated by this finding, we construct two graphs that are highly homophilic and heterophilic, respectively. They are used to build low-pass and high-pass filters to capture holistic information. Important features are further enhanced by the squeeze-and-excitation block. We validate our approach through extensive experiments on both homophilic and heterophilic graphs. Empirical results demonstrate the superiority of our method compared to state-of-the-art clustering methods. △ Less

Submitted 6 March, 2024; originally announced March 2024.

Comments: 12 pages, 5 figures

arXiv:2403.03659 [pdf, other]

Robust Graph Structure Learning under Heterophily

Authors: Xuanting Xie, Zhao Kang, Wenyu Chen

Abstract: Graph is a fundamental mathematical structure in characterizing relations between different objects and has been widely used on various learning tasks. Most methods implicitly assume a given graph to be accurate and complete. However, real data is inevitably noisy and sparse, which will lead to inferior results. Despite the remarkable success of recent graph representation learning methods, they i… ▽ More Graph is a fundamental mathematical structure in characterizing relations between different objects and has been widely used on various learning tasks. Most methods implicitly assume a given graph to be accurate and complete. However, real data is inevitably noisy and sparse, which will lead to inferior results. Despite the remarkable success of recent graph representation learning methods, they inherently presume that the graph is homophilic, and largely overlook heterophily, where most connected nodes are from different classes. In this regard, we propose a novel robust graph structure learning method to achieve a high-quality graph from heterophilic data for downstream tasks. We first apply a high-pass filter to make each node more distinctive from its neighbors by encoding structure information into the node features. Then, we learn a robust graph with an adaptive norm characterizing different levels of noise. Afterwards, we propose a novel regularizer to further refine the graph structure. Clustering and semi-supervised classification experiments on heterophilic graphs verify the effectiveness of our method. △ Less

Submitted 6 March, 2024; originally announced March 2024.

Comments: 26 pages, 5 figures

arXiv:2403.02775 [pdf, other]

EasyQuant: An Efficient Data-free Quantization Algorithm for LLMs

Authors: Hanlin Tang, Yifu Sun, Decheng Wu, Kai Liu, Jianchen Zhu, Zhanhui Kang

Abstract: Large language models (LLMs) have proven to be very superior to conventional methods in various tasks. However, their expensive computations and high memory requirements are prohibitive for deployment. Model quantization is an effective method for reducing this overhead. The problem is that in most previous works, the quantized model was calibrated using few samples from the training data, which m… ▽ More Large language models (LLMs) have proven to be very superior to conventional methods in various tasks. However, their expensive computations and high memory requirements are prohibitive for deployment. Model quantization is an effective method for reducing this overhead. The problem is that in most previous works, the quantized model was calibrated using few samples from the training data, which might affect the generalization of the quantized LLMs to unknown cases and tasks. Hence in this work, we explore an important question: Can we design a data-independent quantization method for LLMs to guarantee its generalization performance? In this work, we propose EasyQuant, a training-free and data-independent weight-only quantization algorithm for LLMs. Our observation indicates that two factors: outliers in the weight and quantization ranges, are essential for reducing the quantization error. Therefore, in EasyQuant, we leave the outliers (less than 1%) unchanged and optimize the quantization range to reduce the reconstruction error. With these methods, we surprisingly find that EasyQuant achieves comparable performance to the original model. Since EasyQuant does not depend on any training data, the generalization performance of quantized LLMs is safely guaranteed. Moreover, EasyQuant can be implemented in parallel so that the quantized model could be attained in a few minutes even for LLMs over 100B. To our best knowledge, we are the first work that achieves almost lossless quantization performance for LLMs under a data-independent setting and our algorithm runs over 10 times faster than the data-dependent methods. △ Less

Submitted 5 March, 2024; originally announced March 2024.

arXiv:2403.01886 [pdf, other]

FCDS: Fusing Constituency and Dependency Syntax into Document-Level Relation Extraction

Authors: Xudong Zhu, Zhao Kang, Bei Hui

Abstract: Document-level Relation Extraction (DocRE) aims to identify relation labels between entities within a single document. It requires handling several sentences and reasoning over them. State-of-the-art DocRE methods use a graph structure to connect entities across the document to capture dependency syntax information. However, this is insufficient to fully exploit the rich syntax information in the… ▽ More Document-level Relation Extraction (DocRE) aims to identify relation labels between entities within a single document. It requires handling several sentences and reasoning over them. State-of-the-art DocRE methods use a graph structure to connect entities across the document to capture dependency syntax information. However, this is insufficient to fully exploit the rich syntax information in the document. In this work, we propose to fuse constituency and dependency syntax into DocRE. It uses constituency syntax to aggregate the whole sentence information and select the instructive sentences for the pairs of targets. It exploits the dependency syntax in a graph structure with constituency syntax enhancement and chooses the path between entity pairs based on the dependency graph. The experimental results on datasets from various domains demonstrate the effectiveness of the proposed method. The code is publicly available at this url. △ Less

Submitted 4 March, 2024; originally announced March 2024.

Comments: Appear in COLING 2024

arXiv:2402.18581 [pdf, other]

Multi-objective Optimal Roadside Units Deployment in Urban Vehicular Networks

Authors: Weian Guo, Zecheng Kang, Dongyang Li, Lun Zhang, Li Li

Abstract: The significance of transportation efficiency, safety, and related services is increasing in urban vehicular networks. Within such networks, roadside units (RSUs) serve as intermediates in facilitating communication. Therefore, the deployment of RSUs is of utmost importance in ensuring the quality of communication services. However, the optimization objectives, such as time delay and deployment co… ▽ More The significance of transportation efficiency, safety, and related services is increasing in urban vehicular networks. Within such networks, roadside units (RSUs) serve as intermediates in facilitating communication. Therefore, the deployment of RSUs is of utmost importance in ensuring the quality of communication services. However, the optimization objectives, such as time delay and deployment cost, are commonly developed from diverse perspectives. As a result, it is possible that conflicts may arise among the objectives. Furthermore, in urban environments, the presence of various obstacles, such as buildings, gardens, lakes, and other infrastructure, poses challenges for the deployment of RSUs. Hence, the deployment encounters significant difficulties due to the existence of multiple objectives, constraints imposed by obstacles, and the necessity to explore a large-scale optimization space. To address this issue, two versions of multi-objective optimization algorithms are proposed in this paper. By utilizing a multi-population strategy and an adaptive exploration technique, the methods efficiently explore a large-scale decision-variable space. In order to mitigate the issue of an overcrowded deployment of RSUs, a calibrating mechanism is adopted to adjust RSU density during the optimization procedures. The proposed methods also take care of data offloading between vehicles and RSUs by setting up an iterative best response sequence game (IBRSG). By comparing the proposed algorithms with several state-of-the-art algorithms, the results demonstrate that our strategies perform better in both high-density and low-density urban scenarios. The results also indicate that the proposed solutions substantially improve the efficiency of vehicular networks. △ Less

Submitted 14 January, 2024; originally announced February 2024.

Comments: This manuscript has been submitted to the journal of IEEE Transactions on Vehicular Technology

arXiv:2402.13607 [pdf, other]

CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models

Authors: Fuwen Luo, Chi Chen, Zihao Wan, Zhaolu Kang, Qidong Yan, Yingjie Li, Xiaolong Wang, Siyu Wang, Ziyue Wang, Xiaoyue Mi, Peng Li, Ning Ma, Maosong Sun, Yang Liu

Abstract: Multimodal large language models (MLLMs) have demonstrated promising results in a variety of tasks that combine vision and language. As these models become more integral to research and applications, conducting comprehensive evaluations of their capabilities has grown increasingly important. However, most existing benchmarks fail to consider that, in certain situations, images need to be interpret… ▽ More Multimodal large language models (MLLMs) have demonstrated promising results in a variety of tasks that combine vision and language. As these models become more integral to research and applications, conducting comprehensive evaluations of their capabilities has grown increasingly important. However, most existing benchmarks fail to consider that, in certain situations, images need to be interpreted within a broader context. In this work, we introduce a new benchmark, named as CODIS, designed to assess the ability of models to use context provided in free-form text to enhance visual comprehension. Our findings indicate that MLLMs consistently fall short of human performance on this benchmark. Further analysis confirms that these models struggle to effectively extract and utilize contextual information to improve their understanding of images. This underscores the pressing need to enhance the ability of MLLMs to comprehend visuals in a context-dependent manner. View our project website at https://thunlp-mt.github.io/CODIS. △ Less

Submitted 4 June, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

arXiv:2402.04883 [pdf, other]

Toward Accurate Camera-based 3D Object Detection via Cascade Depth Estimation and Calibration

Authors: Chaoqun Wang, Yiran Qin, Zijian Kang, Ningning Ma, Ruimao Zhang

Abstract: Recent camera-based 3D object detection is limited by the precision of transforming from image to 3D feature spaces, as well as the accuracy of object localization within the 3D space. This paper aims to address such a fundamental problem of camera-based 3D object detection: How to effectively learn depth information for accurate feature lifting and object localization. Different from previous met… ▽ More Recent camera-based 3D object detection is limited by the precision of transforming from image to 3D feature spaces, as well as the accuracy of object localization within the 3D space. This paper aims to address such a fundamental problem of camera-based 3D object detection: How to effectively learn depth information for accurate feature lifting and object localization. Different from previous methods which directly predict depth distributions by using a supervised estimation model, we propose a cascade framework consisting of two depth-aware learning paradigms. First, a depth estimation (DE) scheme leverages relative depth information to realize the effective feature lifting from 2D to 3D spaces. Furthermore, a depth calibration (DC) scheme introduces depth reconstruction to further adjust the 3D object localization perturbation along the depth axis. In practice, the DE is explicitly realized by using both the absolute and relative depth optimization loss to promote the precision of depth prediction, while the capability of DC is implicitly embedded into the detection Transformer through a depth denoising mechanism in the training phase. The entire model training is accomplished through an end-to-end manner. We propose a baseline detector and evaluate the effectiveness of our proposal with +2.2%/+2.7% NDS/mAP improvements on NuScenes benchmark, and gain a comparable performance with 55.9%/45.7% NDS/mAP. Furthermore, we conduct extensive experiments to demonstrate its generality based on various detectors with about +2% NDS improvements. △ Less

Submitted 7 February, 2024; originally announced February 2024.

Comments: Accepted to ICRA2024

arXiv:2402.01516 [pdf, other]

Cross-view Masked Diffusion Transformers for Person Image Synthesis

Authors: Trung X. Pham, Zhang Kang, Chang D. Yoo

Abstract: We present X-MDPT ($\underline{Cross}$-view $\underline{M}$asked $\underline{D}$iffusion $\underline{P}$rediction $\underline{T}$ransformers), a novel diffusion model designed for pose-guided human image generation. X-MDPT distinguishes itself by employing masked diffusion transformers that operate on latent patches, a departure from the commonly-used Unet structures in existing works. The model c… ▽ More We present X-MDPT ($\underline{Cross}$-view $\underline{M}$asked $\underline{D}$iffusion $\underline{P}$rediction $\underline{T}$ransformers), a novel diffusion model designed for pose-guided human image generation. X-MDPT distinguishes itself by employing masked diffusion transformers that operate on latent patches, a departure from the commonly-used Unet structures in existing works. The model comprises three key modules: 1) a denoising diffusion Transformer, 2) an aggregation network that consolidates conditions into a single vector for the diffusion process, and 3) a mask cross-prediction module that enhances representation learning with semantic information from the reference image. X-MDPT demonstrates scalability, improving FID, SSIM, and LPIPS with larger models. Despite its simple design, our model outperforms state-of-the-art approaches on the DeepFashion dataset while exhibiting efficiency in terms of training parameters, training time, and inference speed. Our compact 33MB model achieves an FID of 7.42, surpassing a prior Unet latent diffusion approach (FID 8.07) using only $11\times$ fewer parameters. Our best model surpasses the pixel-based diffusion with $\frac{2}{3}$ of the parameters and achieves $5.43 \times$ faster inference. The code is available at https://github.com/trungpx/xmdpt. △ Less

Submitted 3 June, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

Comments: ICML 2024

arXiv:2401.03849 [pdf, other]

Confinement Bubble Wall Velocity via Quasiparticle Determination

Authors: Zhaofeng Kang, Jiang Zhu

Abstract: Lattice simulations reveal that the deconfinement-confinement (D-C) phase transition (PT) of the hot pure $SU(N>2)$ Yang-Mills system is first order. This system can be described by a pool of quasigluons moving in the Polyakov loop background, and in this picture, we establish an effective distribution function for quasigluons, which encodes interactions among quasigluons and in particular the con… ▽ More Lattice simulations reveal that the deconfinement-confinement (D-C) phase transition (PT) of the hot pure $SU(N>2)$ Yang-Mills system is first order. This system can be described by a pool of quasigluons moving in the Polyakov loop background, and in this picture, we establish an effective distribution function for quasigluons, which encodes interactions among quasigluons and in particular the confinement effect. With it, we made the first attempt to calculate the confinement bubble wall velocity $v_w$ at the microscopical level, and we obtained a small velocity $v_w\sim 0.04$ using two different approaches, which is qualitatively consistent with others results like holography. △ Less

Submitted 30 January, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

Comments: 7 pages, 3 figures

arXiv:2401.02913 [pdf, other]

Plug-in Diffusion Model for Sequential Recommendation

Authors: Haokai Ma, Ruobing Xie, Lei Meng, Xin Chen, Xu Zhang, Leyu Lin, Zhanhui Kang

Abstract: Pioneering efforts have verified the effectiveness of the diffusion models in exploring the informative uncertainty for recommendation. Considering the difference between recommendation and image synthesis tasks, existing methods have undertaken tailored refinements to the diffusion and reverse process. However, these approaches typically use the highest-score item in corpus for user interest pred… ▽ More Pioneering efforts have verified the effectiveness of the diffusion models in exploring the informative uncertainty for recommendation. Considering the difference between recommendation and image synthesis tasks, existing methods have undertaken tailored refinements to the diffusion and reverse process. However, these approaches typically use the highest-score item in corpus for user interest prediction, leading to the ignorance of the user's generalized preference contained within other items, thereby remaining constrained by the data sparsity issue. To address this issue, this paper presents a novel Plug-in Diffusion Model for Recommendation (PDRec) framework, which employs the diffusion model as a flexible plugin to jointly take full advantage of the diffusion-generating user preferences on all items. Specifically, PDRec first infers the users' dynamic preferences on all items via a time-interval diffusion model and proposes a Historical Behavior Reweighting (HBR) mechanism to identify the high-quality behaviors and suppress noisy behaviors. In addition to the observed items, PDRec proposes a Diffusion-based Positive Augmentation (DPA) strategy to leverage the top-ranked unobserved items as the potential positive samples, bringing in informative and diverse soft signals to alleviate data sparsity. To alleviate the false negative sampling issue, PDRec employs Noise-free Negative Sampling (NNS) to select stable negative samples for ensuring effective model optimization. Extensive experiments and analyses on four datasets have verified the superiority of the proposed PDRec over the state-of-the-art baselines and showcased the universality of PDRec as a flexible plugin for commonly-used sequential encoders in different recommendation scenarios. The code is available in https://github.com/hulkima/PDRec. △ Less

Submitted 5 January, 2024; originally announced January 2024.

Comments: Accepted by AAAI 2024

arXiv:2401.01941 [pdf, other]

The DIS 1-Jettiness Event Shape at N$^3$LL+${\cal O}(α_s^2)$

Authors: Haotian Cao, Zhong-Bo Kang, Xiaohui Liu, Sonny Mantry

Abstract: We present results for the $τ_1$ and $τ_{1a}$ 1-Jettiness global event shape distributions, for Deep Inelastic Scattering (DIS), at the N$^3$LL + ${\cal O}(α_s^2)$ level of accuracy. These event-shape distributions quantify and characterize the pattern of final state radiation in electron-nucleus collisions. They can be used as a probe of nuclear structure functions, nuclear medium effects in jet… ▽ More We present results for the $τ_1$ and $τ_{1a}$ 1-Jettiness global event shape distributions, for Deep Inelastic Scattering (DIS), at the N$^3$LL + ${\cal O}(α_s^2)$ level of accuracy. These event-shape distributions quantify and characterize the pattern of final state radiation in electron-nucleus collisions. They can be used as a probe of nuclear structure functions, nuclear medium effects in jet production, and for a precision extraction of the QCD strong coupling. The results presented here, along with the corresponding numerical codes, can be used for analyses with HERA data, in EIC simulation studies, and for eventual comparison with real EIC data. △ Less

Submitted 21 June, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

Comments: 42 pages, 8 figures, references added, new appendix with extended discussion on shape function added, version to appear in Physical Review D

arXiv:2312.17484 [pdf, other]

Truth Forest: Toward Multi-Scale Truthfulness in Large Language Models through Intervention without Tuning

Authors: Zhongzhi Chen, Xingwu Sun, Xianfeng Jiao, Fengzong Lian, Zhanhui Kang, Di Wang, Cheng-Zhong Xu

Abstract: Despite the great success of large language models (LLMs) in various tasks, they suffer from generating hallucinations. We introduce Truth Forest, a method that enhances truthfulness in LLMs by uncovering hidden truth representations using multi-dimensional orthogonal probes. Specifically, it creates multiple orthogonal bases for modeling truth by incorporating orthogonal constraints into the prob… ▽ More Despite the great success of large language models (LLMs) in various tasks, they suffer from generating hallucinations. We introduce Truth Forest, a method that enhances truthfulness in LLMs by uncovering hidden truth representations using multi-dimensional orthogonal probes. Specifically, it creates multiple orthogonal bases for modeling truth by incorporating orthogonal constraints into the probes. Moreover, we introduce Random Peek, a systematic technique considering an extended range of positions within the sequence, reducing the gap between discerning and generating truth features in LLMs. By employing this approach, we improved the truthfulness of Llama-2-7B from 40.8\% to 74.5\% on TruthfulQA. Likewise, significant improvements are observed in fine-tuned models. We conducted a thorough analysis of truth features using probes. Our visualization results show that orthogonal probes capture complementary truth-related features, forming well-defined clusters that reveal the inherent structure of the dataset. △ Less

Submitted 14 January, 2024; v1 submitted 29 December, 2023; originally announced December 2023.

Comments: Accepted as AAAI 2024

arXiv:2312.14438 [pdf, other]

PC-Conv: Unifying Homophily and Heterophily with Two-fold Filtering

Authors: Bingheng Li, Erlin Pan, Zhao Kang

Abstract: Recently, many carefully crafted graph representation learning methods have achieved impressive performance on either strong heterophilic or homophilic graphs, but not both. Therefore, they are incapable of generalizing well across real-world graphs with different levels of homophily. This is attributed to their neglect of homophily in heterophilic graphs, and vice versa. In this paper, we propose… ▽ More Recently, many carefully crafted graph representation learning methods have achieved impressive performance on either strong heterophilic or homophilic graphs, but not both. Therefore, they are incapable of generalizing well across real-world graphs with different levels of homophily. This is attributed to their neglect of homophily in heterophilic graphs, and vice versa. In this paper, we propose a two-fold filtering mechanism to extract homophily in heterophilic graphs and vice versa. In particular, we extend the graph heat equation to perform heterophilic aggregation of global information from a long distance. The resultant filter can be exactly approximated by the Possion-Charlier (PC) polynomials. To further exploit information at multiple orders, we introduce a powerful graph convolution PC-Conv and its instantiation PCNet for the node classification task. Compared with state-of-the-art GNNs, PCNet shows competitive performance on well-known homophilic and heterophilic graphs. Our implementation is available at https://github.com/uestclbh/PC-Conv. △ Less

Submitted 22 December, 2023; originally announced December 2023.

Comments: Accepted by AAAI2024

arXiv:2312.14066 [pdf, other]

Upper Bounding Barlow Twins: A Novel Filter for Multi-Relational Clustering

Authors: Xiaowei Qian, Bingheng Li, Zhao Kang

Abstract: Multi-relational clustering is a challenging task due to the fact that diverse semantic information conveyed in multi-layer graphs is difficult to extract and fuse. Recent methods integrate topology structure and node attribute information through graph filtering. However, they often use a low-pass filter without fully considering the correlation among multiple graphs. To overcome this drawback, w… ▽ More Multi-relational clustering is a challenging task due to the fact that diverse semantic information conveyed in multi-layer graphs is difficult to extract and fuse. Recent methods integrate topology structure and node attribute information through graph filtering. However, they often use a low-pass filter without fully considering the correlation among multiple graphs. To overcome this drawback, we propose to learn a graph filter motivated by the theoretical analysis of Barlow Twins. We find that input with a negative semi-definite inner product provides a lower bound for Barlow Twins loss, which prevents it from reaching a better solution. We thus learn a filter that yields an upper bound for Barlow Twins. Afterward, we design a simple clustering architecture and demonstrate its state-of-the-art performance on four benchmark datasets. △ Less

Submitted 21 December, 2023; originally announced December 2023.

Comments: Accepted by AAAI 2024

arXiv:2312.09226 [pdf, other]

Nuclear modified transverse momentum dependent parton distribution and fragmentation functions

Authors: Mishary Alrashed, Zhong-Bo Kang, John Terry, Hongxi Xing, Congyue Zhang

Abstract: In this study, we extend our previous global analysis of nuclear-modified transverse momentum distribution functions (nTMDs) to also consider the nuclear-modified collinear fragmentation function. Our methodology incorporates the global set of experimental data from both Drell-Yan production and Semi-Inclusive Deep Inelastic Scattering. Through a comprehensive global extraction of these distributi… ▽ More In this study, we extend our previous global analysis of nuclear-modified transverse momentum distribution functions (nTMDs) to also consider the nuclear-modified collinear fragmentation function. Our methodology incorporates the global set of experimental data from both Drell-Yan production and Semi-Inclusive Deep Inelastic Scattering. Through a comprehensive global extraction of these distributions, we demonstrate the effectiveness of this extension by strongly describing the entire global dataset. A focal point of this paper is the impact of recent Jefferson Lab measurements. Most notably, to simultaneously describe experimental data at Jefferson Lab and HERMES we find that it is necessary to introduce a parameter which accounts for the non-perturbative scale evolution of the nTMDs. Additionally, we assess the kinematic coverage of the experimental data and provide insights into experimental opportunities at Jefferson Lab, future Electron-Ion Colliders, RHIC, and the LHC. These opportunities have the potential to significantly enhance and refine global analyses of nuclear-modified TMDs, contributing to a deeper understanding of the structure of cold nuclear matter. △ Less

Submitted 14 December, 2023; originally announced December 2023.

Comments: 39 pages, 16 figures

Report number: LA-UR-23-33438

arXiv:2311.17142 [pdf, other]

doi 10.1103/PhysRevD.109.094012

Transverse Energy-Energy Correlators in the Color-Glass Condensate at the Electron-Ion Collider

Authors: Zhong-Bo Kang, Jani Penttala, Fanyi Zhao, Yiyu Zhou

Abstract: We investigate the transverse energy-energy correlators (TEEC) in the small-$x$ regime at the upcoming Electron-Ion Collider (EIC). Focusing on the back-to-back production of electron-hadron pairs in both $ep$ and $eA$ collisions, we establish a factorization theorem given in terms of the hard function, quark distributions, soft functions, and TEEC jet functions, where the gluon saturation effect… ▽ More We investigate the transverse energy-energy correlators (TEEC) in the small-$x$ regime at the upcoming Electron-Ion Collider (EIC). Focusing on the back-to-back production of electron-hadron pairs in both $ep$ and $eA$ collisions, we establish a factorization theorem given in terms of the hard function, quark distributions, soft functions, and TEEC jet functions, where the gluon saturation effect is incorporated. Numerical results for TEEC in both $ep$ and $eA$ collisions are presented, together with the nuclear modification factor $R_A$. Our analysis reveals that TEEC observables in deep inelastic scattering provide a valuable approach for probing gluon saturation phenomena. Our findings underscore the significance of measuring TEEC at the EIC, emphasizing its efficacy in advancing our understanding of gluon saturation and nuclear modifications in high-energy collisions. △ Less

Submitted 28 November, 2023; originally announced November 2023.

Comments: 11 pages, 3 figures

Report number: MIT-CTP/5649

Journal ref: Phys. Rev. D 109, 094012 (2024)

arXiv:2311.01033 [pdf, other]

Non-Autoregressive Diffusion-based Temporal Point Processes for Continuous-Time Long-Term Event Prediction

Authors: Wang-Tao Zhou, Zhao Kang, Ling Tian

Abstract: Continuous-time long-term event prediction plays an important role in many application scenarios. Most existing works rely on autoregressive frameworks to predict event sequences, which suffer from error accumulation, thus compromising prediction quality. Inspired by the success of denoising diffusion probabilistic models, we propose a diffusion-based non-autoregressive temporal point process mode… ▽ More Continuous-time long-term event prediction plays an important role in many application scenarios. Most existing works rely on autoregressive frameworks to predict event sequences, which suffer from error accumulation, thus compromising prediction quality. Inspired by the success of denoising diffusion probabilistic models, we propose a diffusion-based non-autoregressive temporal point process model for long-term event prediction in continuous time. Instead of generating events one at a time in an autoregressive way, our model predicts the future event sequence entirely as a whole. In order to perform diffusion processes on event sequences, we develop a bidirectional map between target event sequences and the Euclidean vector space. Furthermore, we design a novel denoising network to capture both sequential and contextual features for better sample quality. Extensive experiments are conducted to prove the superiority of our proposed model over state-of-the-art methods on long-term event prediction in continuous time. To the best of our knowledge, this is the first work to apply diffusion methods to long-term event prediction problems. △ Less

Submitted 2 November, 2023; originally announced November 2023.

arXiv:2311.00672 [pdf, other]

doi 10.1007/JHEP03(2024)142

Polarized fragmenting jet functions in Inclusive and Exclusive Jet Production

Authors: Zhong-Bo Kang, Hongxi Xing, Fanyi Zhao, Yiyu Zhou

Abstract: In this work, we present a complete theoretical framework for analyzing the distribution of polarized hadrons within jets, with and without measuring the transverse momentum relative to the standard jet axis. Using soft-collinear effective theory (SCET), we derive the factorization and provide the theoretical calculation of both semi-inclusive and exclusive fragmenting jet functions (FJFs) under l… ▽ More In this work, we present a complete theoretical framework for analyzing the distribution of polarized hadrons within jets, with and without measuring the transverse momentum relative to the standard jet axis. Using soft-collinear effective theory (SCET), we derive the factorization and provide the theoretical calculation of both semi-inclusive and exclusive fragmenting jet functions (FJFs) under longitudinal and transverse polarization. With the polarized FJFs, one gains access to a variety of new observables that can be used for extracting both collinear and transverse momentum dependent parton distribution functions (PDFs) and fragmentation functions (FFs). As examples, we provide numerical results for the spin asymmetry $A_{TU,T}^{\cos(φ_S - \hatφ_{S_h})}$ from polarized semi-inclusive hadron-in-jet production in polarized $pp$ collisions at RHIC kinematics, where a transversely polarized quark would lead to the transverse spin of the final-state hadron inside the jet and is thus sensitive to the transversity fragmentation functions. Similarly, another spin asymmetry, $A_{TU, L}^{\cos(φ_q - φ_{S})}$ from polarized exclusive hadron-in-jet production in polarized $ep$ collisions at EIC kinematics would allow us to access the helicity fragmentation functions. These observables demonstrate promising potential in investigating transverse momentum dependent PDFs and FFs and are worthwhile for further measurements. △ Less

Submitted 5 May, 2024; v1 submitted 1 November, 2023; originally announced November 2023.

Comments: 49 pages, 9 figures

Report number: MIT-CTP/5633

Journal ref: JHEP 03, 142 (2024)

arXiv:2310.15929 [pdf, other]

E-Sparse: Boosting the Large Language Model Inference through Entropy-based N:M Sparsity

Authors: Yun Li, Lin Niu, Xipeng Zhang, Kai Liu, Jianchen Zhu, Zhanhui Kang

Abstract: Traditional pruning methods are known to be challenging to work in Large Language Models (LLMs) for Generative AI because of their unaffordable training process and large computational demands. For the first time, we introduce the information entropy of hidden state features into a pruning metric design, namely E-Sparse, to improve the accuracy of N:M sparsity on LLM. E-Sparse employs the informat… ▽ More Traditional pruning methods are known to be challenging to work in Large Language Models (LLMs) for Generative AI because of their unaffordable training process and large computational demands. For the first time, we introduce the information entropy of hidden state features into a pruning metric design, namely E-Sparse, to improve the accuracy of N:M sparsity on LLM. E-Sparse employs the information richness to leverage the channel importance, and further incorporates several novel techniques to put it into effect: (1) it introduces information entropy to enhance the significance of parameter weights and input feature norms as a novel pruning metric, and performs N:M sparsity without modifying the remaining weights. (2) it designs global naive shuffle and local block shuffle to quickly optimize the information distribution and adequately cope with the impact of N:M sparsity on LLMs' accuracy. E-Sparse is implemented as a Sparse-GEMM on FasterTransformer and runs on NVIDIA Ampere GPUs. Extensive experiments on the LLaMA family and OPT models show that E-Sparse can significantly speed up the model inference over the dense model (up to 1.53X) and obtain significant memory saving (up to 43.52%), with acceptable accuracy loss. △ Less

Submitted 22 March, 2024; v1 submitted 24 October, 2023; originally announced October 2023.

arXiv:2310.15159 [pdf, other]

doi 10.1007/JHEP03(2024)153

Probing Transverse Momentum Dependent Structures with Azimuthal Dependence of Energy Correlators

Authors: Zhong-Bo Kang, Kyle Lee, Ding Yu Shao, Fanyi Zhao

Abstract: We study the azimuthal angle dependence of the energy-energy correlators $\langle \mathcal{E}(\hat{n}_1)\mathcal{E}(\hat{n}_2)\rangle$ in the back-to-back region for $e^+e^-$ annihilation and deep inelastic scattering (DIS) processes with general polarization of the proton beam. We demonstrate that the polarization information of the beam and the underlying partons from the hard scattering is prop… ▽ More We study the azimuthal angle dependence of the energy-energy correlators $\langle \mathcal{E}(\hat{n}_1)\mathcal{E}(\hat{n}_2)\rangle$ in the back-to-back region for $e^+e^-$ annihilation and deep inelastic scattering (DIS) processes with general polarization of the proton beam. We demonstrate that the polarization information of the beam and the underlying partons from the hard scattering is propagated into the azimuthal angle dependence of the energy-energy correlators. In the process, we define the Collins-type EEC jet functions and introduce a new EEC observable using the lab-frame angles in the DIS process. Furthermore, we extend our formalism to explore the two-point energy correlation between hadrons with different quantum numbers $\mathbb{S}_i$ in the back-to-back limit $\langle \mathcal{E}_{\mathbb{S}_1}(\hat{n}_1)\mathcal{E}_{\mathbb{S}_2}(\hat{n}_2)\rangle$. We find that in the Operator Product Expansion (OPE) region the nonperturbative information is entirely encapsulated by a single number. Using our formalism, we present several phenomenological studies that showcase how energy correlators can be used to probe transverse momentum dependent structures. △ Less

Submitted 23 October, 2023; originally announced October 2023.

Comments: 36 pages, 7 figures

Report number: MIT-CTP/5632

Journal ref: JHEP 03, 153 (2024)

arXiv:2310.13540 [pdf, other]

Thoroughly Modeling Multi-domain Pre-trained Recommendation as Language

Authors: Zekai Qu, Ruobing Xie, Chaojun Xiao, Yuan Yao, Zhiyuan Liu, Fengzong Lian, Zhanhui Kang, Jie Zhou

Abstract: With the thriving of pre-trained language model (PLM) widely verified in various of NLP tasks, pioneer efforts attempt to explore the possible cooperation of the general textual information in PLM with the personalized behavioral information in user historical behavior sequences to enhance sequential recommendation (SR). However, despite the commonalities of input format and task goal, there are h… ▽ More With the thriving of pre-trained language model (PLM) widely verified in various of NLP tasks, pioneer efforts attempt to explore the possible cooperation of the general textual information in PLM with the personalized behavioral information in user historical behavior sequences to enhance sequential recommendation (SR). However, despite the commonalities of input format and task goal, there are huge gaps between the behavioral and textual information, which obstruct thoroughly modeling SR as language modeling via PLM. To bridge the gap, we propose a novel Unified pre-trained language model enhanced sequential recommendation (UPSR), aiming to build a unified pre-trained recommendation model for multi-domain recommendation tasks. We formally design five key indicators, namely naturalness, domain consistency, informativeness, noise & ambiguity, and text length, to guide the text-item adaptation and behavior sequence-text sequence adaptation differently for pre-training and fine-tuning stages, which are essential but under-explored by previous works. In experiments, we conduct extensive evaluations on seven datasets with both tuning and zero-shot settings and achieve the overall best performance. Comprehensive model analyses also provide valuable insights for behavior modeling via PLM, shedding light on large pre-trained recommendation models. The source codes will be released in the future. △ Less

Submitted 27 November, 2023; v1 submitted 20 October, 2023; originally announced October 2023.

arXiv:2310.12847 [pdf, other]

Correspondence between Color Glass Condensate and High-Twist Formalism

Authors: Yu Fu, Zhong-Bo Kang, Farid Salazar, Xin-Nian Wang, Hongxi Xing

Abstract: The Color Glass Condensate (CGC) effective theory and the collinear factorization at high-twist (HT) are two well-known frameworks describing perturbative QCD multiple scatterings in nuclear media. It has long been recognized that these two formalisms have their own domain of validity in different kinematics regions. Taking direct photon production in proton-nucleus collisions as an example, we cl… ▽ More The Color Glass Condensate (CGC) effective theory and the collinear factorization at high-twist (HT) are two well-known frameworks describing perturbative QCD multiple scatterings in nuclear media. It has long been recognized that these two formalisms have their own domain of validity in different kinematics regions. Taking direct photon production in proton-nucleus collisions as an example, we clarify for the first time the relation between CGC and HT at the level of a physical observable. We show that the CGC formalism beyond shock-wave approximation, and with the Landau-Pomeranchuk-Migdal interference effect is consistent with the HT formalism in the transition region where they overlap. Such a unified picture paves the way for map** out the phase diagram of parton density in nuclear medium from dilute to dense region. △ Less

Submitted 19 October, 2023; originally announced October 2023.

Comments: 7 pages, 3 figures + supplemental material

arXiv:2310.12102 [pdf, other]

doi 10.1007/JHEP03(2024)027

Direct quarkonium-plus-gluon production in DIS in the Color Glass Condensate

Authors: Zhong-Bo Kang, Emilie Li, Farid Salazar

Abstract: We compute the differential cross-section for direct quarkonium production accompanied by a gluon in high-energy deep inelastic scattering (DIS) at small-$x$. We employ the Non-Relativistic QCD factorization framework, focusing on the $S$-wave contribution to the formation of the quarkonium, and including both color singlet and octet contributions. Our short distance coefficients for the productio… ▽ More We compute the differential cross-section for direct quarkonium production accompanied by a gluon in high-energy deep inelastic scattering (DIS) at small-$x$. We employ the Non-Relativistic QCD factorization framework, focusing on the $S$-wave contribution to the formation of the quarkonium, and including both color singlet and octet contributions. Our short distance coefficients for the production of the heavy quark pair are obtained within the Color Glass Condensate effective field theory. Our results pave the way towards the next-to-leading order computation of direct quarkonium in DIS, as well as the study of azimuthal correlations of direct quarkonium and jet. △ Less

Submitted 18 October, 2023; originally announced October 2023.

Comments: 52 pages, 7 figures, 1 table

Journal ref: JHEP 03, 027 (2024)

arXiv:2310.04681 [pdf, other]

VoiceExtender: Short-utterance Text-independent Speaker Verification with Guided Diffusion Model

Authors: Yayun He, Zuheng Kang, Jianzong Wang, Junqing Peng, **g Xiao

Abstract: Speaker verification (SV) performance deteriorates as utterances become shorter. To this end, we propose a new architecture called VoiceExtender which provides a promising solution for improving SV performance when handling short-duration speech signals. We use two guided diffusion models, the built-in and the external speaker embedding (SE) guided diffusion model, both of which utilize a diffusio… ▽ More Speaker verification (SV) performance deteriorates as utterances become shorter. To this end, we propose a new architecture called VoiceExtender which provides a promising solution for improving SV performance when handling short-duration speech signals. We use two guided diffusion models, the built-in and the external speaker embedding (SE) guided diffusion model, both of which utilize a diffusion model-based sample generator that leverages SE guidance to augment the speech features based on a short utterance. Extensive experimental results on the VoxCeleb1 dataset show that our method outperforms the baseline, with relative improvements in equal error rate (EER) of 46.1%, 35.7%, 10.4%, and 5.7% for the short utterance conditions of 0.5, 1.0, 1.5, and 2.0 seconds, respectively. △ Less

Submitted 6 October, 2023; originally announced October 2023.

Comments: Accepted by the 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2023)

arXiv:2310.02576 [pdf, other]

doi 10.1007/s11063-024-11466-7

A Prototype-Based Neural Network for Image Anomaly Detection and Localization

Authors: Chao Huang, Zhao Kang, Hong Wu

Abstract: Image anomaly detection and localization perform not only image-level anomaly classification but also locate pixel-level anomaly regions. Recently, it has received much research attention due to its wide application in various fields. This paper proposes ProtoAD, a prototype-based neural network for image anomaly detection and localization. First, the patch features of normal images are extracted… ▽ More Image anomaly detection and localization perform not only image-level anomaly classification but also locate pixel-level anomaly regions. Recently, it has received much research attention due to its wide application in various fields. This paper proposes ProtoAD, a prototype-based neural network for image anomaly detection and localization. First, the patch features of normal images are extracted by a deep network pre-trained on nature images. Then, the prototypes of the normal patch features are learned by non-parametric clustering. Finally, we construct an image anomaly localization network (ProtoAD) by appending the feature extraction network with $L2$ feature normalization, a $1\times1$ convolutional layer, a channel max-pooling, and a subtraction operation. We use the prototypes as the kernels of the $1\times1$ convolutional layer; therefore, our neural network does not need a training phase and can conduct anomaly detection and localization in an end-to-end manner. Extensive experiments on two challenging industrial anomaly detection datasets, MVTec AD and BTAD, demonstrate that ProtoAD achieves competitive performance compared to the state-of-the-art methods with a higher inference speed. The source code is available at: https://github.com/98chao/ProtoAD. △ Less

Submitted 25 May, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

Comments: Published in Neural Processing Letters 2024

Journal ref: Neural Process Lett 56, 169 (2024)

arXiv:2309.13629 [pdf, ps, other]

doi 10.1088/1538-3873/acf15e

Periodic Variable Star Classification with Deep Learning: Handling Data Imbalance in an Ensemble Augmentation Way

Authors: Zihan Kang, Yanxia Zhang, **gyi Zhang, Changhua Li, Minzhi Kong, Yongheng Zhao, Xue-Bing Wu

Abstract: Time-domain astronomy is progressing rapidly with the ongoing and upcoming large-scale photometric sky surveys led by the Vera C. Rubin Observatory project (LSST). Billions of variable sources call for better automatic classification algorithms for light curves. Among them, periodic variable stars are frequently studied. Different categories of periodic variable stars have a high degree of class i… ▽ More Time-domain astronomy is progressing rapidly with the ongoing and upcoming large-scale photometric sky surveys led by the Vera C. Rubin Observatory project (LSST). Billions of variable sources call for better automatic classification algorithms for light curves. Among them, periodic variable stars are frequently studied. Different categories of periodic variable stars have a high degree of class imbalance and pose a challenge to algorithms including deep learning methods. We design two kinds of architectures of neural networks for the classification of periodic variable stars in the Catalina Survey's Data Release 2: a multi-input recurrent neural network (RNN) and a compound network combing the RNN and the convolutional neural network (CNN). To deal with class imbalance, we apply Gaussian Process to generate synthetic light curves with artificial uncertainties for data augmentation. For better performance, we organize the augmentation and training process in a "bagging-like" ensemble learning scheme. The experimental results show that the better approach is the compound network combing RNN and CNN, which reaches the best result of 86.2% on the overall balanced accuracy and 0.75 on the macro F1 score. We develop the ensemble augmentation method to solve the data imbalance when classifying variable stars and prove the effectiveness of combining different representations of light curves in a single model. The proposed methods would help build better classification algorithms of periodic time series data for future sky surveys (e.g., LSST). △ Less

Submitted 24 September, 2023; originally announced September 2023.

Comments: 10 pages, 8 figures, accepted

Journal ref: PASP 135 094501 (2023)

arXiv:2309.07111 [pdf, other]

doi 10.1038/s41467-023-43365-1

Anomalous excitonic phase diagram in band-gap-tuned Ta2Ni(Se,S)5

Authors: Cheng Chen, Weichen Tang, Xiang Chen, Zhibo Kang, Shuhan Ding, Kirsty Scott, Siqi Wang, Zhenglu Li, Jacob P. C. Ruff, Makoto Hashimoto, Dong-Hui Lu, Chris Jozwiak, Aaron Bostwick, Eli Rotenberg, Eduardo H. da Silva Neto, Robert J. Birgeneau, Yulin Chen, Steven G. Louie, Yao Wang, Yu He

Abstract: During a band-gap-tuned semimetal-to-semiconductor transition, Coulomb attraction between electrons and holes can cause spontaneously formed excitons near the zero-band-gap point, or the Lifshitz transition point. This has become an important route to realize bulk excitonic insulators -- an insulating ground state distinct from single-particle band insulators. How this route manifests from weak to… ▽ More During a band-gap-tuned semimetal-to-semiconductor transition, Coulomb attraction between electrons and holes can cause spontaneously formed excitons near the zero-band-gap point, or the Lifshitz transition point. This has become an important route to realize bulk excitonic insulators -- an insulating ground state distinct from single-particle band insulators. How this route manifests from weak to strong coupling is not clear. In this work, using angle-resolved photoemission spectroscopy (ARPES) and high-resolution synchrotron x-ray diffraction (XRD), we investigate the broken symmetry state across the semimetal-to-semiconductor transition in a leading bulk excitonic insulator candidate system Ta2Ni(Se,S)5. A broken symmetry phase is found to be continuously suppressed from the semimetal side to the semiconductor side, contradicting the anticipated maximal excitonic instability around the Lifshitz transition. Bolstered by first-principles and model calculations, we find strong interband electron-phonon coupling to play a crucial role in the enhanced symmetry breaking on the semimetal side of the phase diagram. Our results not only provide insight into the longstanding debate of the nature of intertwined orders in Ta2NiSe5, but also establish a basis for exploring band-gap-tuned structural and electronic instabilities in strongly coupled systems. △ Less

Submitted 13 September, 2023; originally announced September 2023.

Comments: 27 pages, 4 + 9 figures

Journal ref: Nat Commun 14, 7512 (2023)

arXiv:2309.07084 [pdf, other]

SupFusion: Supervised LiDAR-Camera Fusion for 3D Object Detection

Authors: Yiran Qin, Chaoqun Wang, Zijian Kang, Ningning Ma, Zhen Li, Ruimao Zhang

Abstract: In this paper, we propose a novel training strategy called SupFusion, which provides an auxiliary feature level supervision for effective LiDAR-Camera fusion and significantly boosts detection performance. Our strategy involves a data enhancement method named Polar Sampling, which densifies sparse objects and trains an assistant model to generate high-quality features as the supervision. These fea… ▽ More In this paper, we propose a novel training strategy called SupFusion, which provides an auxiliary feature level supervision for effective LiDAR-Camera fusion and significantly boosts detection performance. Our strategy involves a data enhancement method named Polar Sampling, which densifies sparse objects and trains an assistant model to generate high-quality features as the supervision. These features are then used to train the LiDAR-Camera fusion model, where the fusion feature is optimized to simulate the generated high-quality features. Furthermore, we propose a simple yet effective deep fusion module, which contiguously gains superior performance compared with previous fusion methods with SupFusion strategy. In such a manner, our proposal shares the following advantages. Firstly, SupFusion introduces auxiliary feature-level supervision which could boost LiDAR-Camera detection performance without introducing extra inference costs. Secondly, the proposed deep fusion could continuously improve the detector's abilities. Our proposed SupFusion and deep fusion module is plug-and-play, we make extensive experiments to demonstrate its effectiveness. Specifically, we gain around 2% 3D mAP improvements on KITTI benchmark based on multiple LiDAR-Camera 3D detectors. △ Less

Submitted 31 October, 2023; v1 submitted 13 September, 2023; originally announced September 2023.

Comments: Accepted to ICCV2023

arXiv:2308.14323 [pdf]

Institutional map** and causal analysis of avalanche vulnerable areas based on multi-source data

Authors: Zexuan Zhou, Bingqi Ma, Jianwei Zhu, Zhizhong Kang

Abstract: Avalanche disaster is a major natural disaster that seriously threatens the national infrastructure and personnel's life safety. For a long time, the research of avalanche disaster prediction in the world is insufficient, there are only some basic models and basic conditions of occurrence, and there is no long series and wide range of avalanche disaster prediction products. Based on 7 different ba… ▽ More Avalanche disaster is a major natural disaster that seriously threatens the national infrastructure and personnel's life safety. For a long time, the research of avalanche disaster prediction in the world is insufficient, there are only some basic models and basic conditions of occurrence, and there is no long series and wide range of avalanche disaster prediction products. Based on 7 different bands and different types of multi-source remote sensing data,this study combined with existing avalanche occurrence models, field investigation and statistical data to analyze the causes of avalanche. The U-net convolutional neural network and threshold analysis were used to extract the distribution of long time series avalanch-prone areas in two study areas, Heiluogou in Sichuan Province and along the Zangpo River in Palong, Tibet Autonomous Region. In addition, the relationship between earthquake magnitude and spatial distribution and avalanche occurrence is also analyzed in this study. This study will also continue to build a prior knowledge base of avalanche occurrence conditions, improve the prediction accuracy of the two methods, and produce products in long time series interannual avalanch-prone areas in southwest China, including Sichuan Province, Yunnan Province, and Tibet Autonomous Region. The resulting products will provide high-precision avalanche prediction and safety assurance for engineering construction and mountaineering activities in Southwest China. △ Less

Submitted 28 August, 2023; originally announced August 2023.

Comments: 19 pages, 13 figures

arXiv:2308.01217 [pdf, other]

TeachCLIP: Multi-Grained Teaching for Efficient Text-to-Video Retrieval

Authors: Kaibin Tian, Ruixiang Zhao, Hu Hu, Runquan Xie, Fengzong Lian, Zhanhui Kang, Xirong Li

Abstract: For text-to-video retrieval (T2VR), which aims to retrieve unlabeled videos by ad-hoc textual queries, CLIP-based methods are dominating. Compared to CLIP4Clip which is efficient and compact, the state-of-the-art models tend to compute video-text similarity by fine-grained cross-modal feature interaction and matching, putting their scalability for large-scale T2VR into doubt. For efficient T2VR, w… ▽ More For text-to-video retrieval (T2VR), which aims to retrieve unlabeled videos by ad-hoc textual queries, CLIP-based methods are dominating. Compared to CLIP4Clip which is efficient and compact, the state-of-the-art models tend to compute video-text similarity by fine-grained cross-modal feature interaction and matching, putting their scalability for large-scale T2VR into doubt. For efficient T2VR, we propose TeachCLIP with multi-grained teaching to let a CLIP4Clip based student network learn from more advanced yet computationally heavy models such as X-CLIP, TS2-Net and X-Pool . To improve the student's learning capability, we add an Attentional frame-Feature Aggregation (AFA) block, which by design adds no extra storage/computation overhead at the retrieval stage. While attentive weights produced by AFA are commonly used for combining frame-level features, we propose a novel use of the weights to let them imitate frame-text relevance estimated by the teacher network. As such, AFA provides a fine-grained learning (teaching) channel for the student (teacher). Extensive experiments on multiple public datasets justify the viability of the proposed method. △ Less

Submitted 2 August, 2023; originally announced August 2023.

arXiv:2308.00336 [pdf, other]

doi 10.3390/universe9070330

Blinkverse: A Database of Fast Radio Bursts

Authors: Jiaying Xu, Yi Feng, Di Li, Pei Wang, Yongkun Zhang, **tao Xie, Huaxi Chen, Han Wang, Zhixuan Kang, **g**g Hu, Yun Zheng, Chao-Wei Tsai, Xianglei Chen, Dengke Zhou

Abstract: The volume of research on fast radio bursts (FRBs) observation have been seeing a dramatic growth. To facilitate the systematic analysis of the FRB population, we established a database platform, Blinkverse (https://blinkverse.alkaidos.cn), as a central inventory of FRBs from various observatories and with published properties, particularly dynamic spectra from FAST, CHIME, GBT, Arecibo, etc. Blin… ▽ More The volume of research on fast radio bursts (FRBs) observation have been seeing a dramatic growth. To facilitate the systematic analysis of the FRB population, we established a database platform, Blinkverse (https://blinkverse.alkaidos.cn), as a central inventory of FRBs from various observatories and with published properties, particularly dynamic spectra from FAST, CHIME, GBT, Arecibo, etc. Blinkverse thus not only forms a superset of FRBCAT, TNS, and CHIME/FRB, but also provides convenient access to thousands of FRB dynamic spectra from FAST, some of which were not available before. Blinkverse is regularly maintained and will be updated by external users in the future. Data entries of FRBs can be retrieved through parameter searches through FRB location, fluence, etc., and their logical combinations. Interactive visualization was built into the platform. We analyzed the energy distribution, period analysis, and classification of FRBs based on data downloaded from Blinkverse. The energy distributions of repeaters and non-repeaters are found to be distinct from one another. △ Less

Submitted 1 August, 2023; originally announced August 2023.

Comments: 13 pages, 9 figures

Journal ref: Universe 2023, 9(7), 330

arXiv:2307.12286 [pdf, ps, other]

Double-Active-IRS Aided Wireless Communication: Deployment Optimization and Capacity Scaling

Authors: Zhenyu Kang, Changsheng You, Rui Zhang

Abstract: In this letter, we consider a double-active-intelligent reflecting surface (IRS) aided wireless communication system, where two active IRSs are properly deployed to assist the communication from a base station (BS) to multiple users located in a given zone via the double-reflection links. Under the assumption of fixed per-element amplification power for each active-IRS element, we formulate a rate… ▽ More In this letter, we consider a double-active-intelligent reflecting surface (IRS) aided wireless communication system, where two active IRSs are properly deployed to assist the communication from a base station (BS) to multiple users located in a given zone via the double-reflection links. Under the assumption of fixed per-element amplification power for each active-IRS element, we formulate a rate maximization problem subject to practical constraints on the reflection design, elements allocation, and placement of active IRSs. To solve this non-convex problem, we first obtain the optimal active-IRS reflections and BS beamforming, based on which we then jointly optimize the active-IRS elements allocation and placement by using the alternating optimization (AO) method. Moreover, we show that given the fixed per-element amplification power, the received signal-to-noise ratio (SNR) at the user increases asymptotically with the square of the number of reflecting elements; while given the fixed number of reflecting elements, the SNR does not increase with the per-element amplification power when it is asymptotically large. Last, numerical results are presented to validate the effectiveness of the proposed AO-based algorithm and compare the rate performance of the considered double-active-IRS aided wireless system with various benchmark systems. △ Less

Submitted 23 July, 2023; originally announced July 2023.

arXiv:2307.06935 [pdf, other]

Collins-type Energy-Energy Correlators and Nucleon Structure

Authors: Zhong-Bo Kang, Kyle Lee, Ding Yu Shao, Fanyi Zhao

Abstract: We generalize the conventional Energy-Energy Correlator (EEC) to include the azimuthal angle dependence, so to define azimuthal angle dependent EEC observables. We study this new EEC observable in $e^+e^-$ and semi-inclusive deep inelastic scattering (SIDIS). In the back-to-back region, we find that the azimuthal angle dependent EEC is sensitive to both the unpolarized EEC jet function and a Colli… ▽ More We generalize the conventional Energy-Energy Correlator (EEC) to include the azimuthal angle dependence, so to define azimuthal angle dependent EEC observables. We study this new EEC observable in $e^+e^-$ and semi-inclusive deep inelastic scattering (SIDIS). In the back-to-back region, we find that the azimuthal angle dependent EEC is sensitive to both the unpolarized EEC jet function and a Collins-type EEC jet function. While the unpolarized EEC jet function is related to the unpolarized transverse momentum dependent (TMD) fragmentation function, the Collins-type EEC jet function is connected with the Collins fragmentation function. We further demonstrate how the new observables allow us to access to the 3D structure of nucleons, especially the spin-dependent ones. △ Less

Submitted 13 July, 2023; originally announced July 2023.

Comments: Presented at DIS2023: XXX International Workshop on Deep-Inelastic Scattering and Related Subjects, Michigan State University, USA, 27-31 March 2023

Showing 1–50 of 370 results for author: Kang, Z