Skip to main content

Showing 1–50 of 129 results for author: Kang, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.12501  [pdf, other

    cs.IR

    Improving Multi-modal Recommender Systems by Denoising and Aligning Multi-modal Content and User Feedback

    Authors: Guipeng Xv, Xinyu Li, Ruobing Xie, Chen Lin, Chong Liu, Feng Xia, Zhanhui Kang, Leyu Lin

    Abstract: Multi-modal recommender systems (MRSs) are pivotal in diverse online web platforms and have garnered considerable attention in recent years. However, previous studies overlook the challenges of (1) noisy multi-modal content, (2) noisy user feedback, and (3) aligning multi-modal content with user feedback. In order to tackle these challenges, we propose Denoising and Aligning Multi-modal Recommende… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  2. arXiv:2405.15280  [pdf, other

    cs.IR cs.AI cs.LG

    DFGNN: Dual-frequency Graph Neural Network for Sign-aware Feedback

    Authors: Yiqing Wu, Ruobing Xie, Zhao Zhang, Xu Zhang, Fuzhen Zhuang, Leyu Lin, Zhanhui Kang, Yongjun Xu

    Abstract: The graph-based recommendation has achieved great success in recent years. However, most existing graph-based recommendations focus on capturing user preference based on positive edges/feedback, while ignoring negative edges/feedback (e.g., dislike, low rating) that widely exist in real-world recommender systems. How to utilize negative feedback in graph-based recommendations still remains underex… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: Accepted by KDD 2024 Research Track

  3. arXiv:2405.03562  [pdf, other

    cs.IR

    ID-centric Pre-training for Recommendation

    Authors: Yiqing Wu, Ruobing Xie, Zhao Zhang, Fuzhen Zhuang, Xu Zhang, Leyu Lin, Zhanhui Kang, Yongjun Xu

    Abstract: Classical sequential recommendation models generally adopt ID embeddings to store knowledge learned from user historical behaviors and represent items. However, these unique IDs are challenging to be transferred to new domains. With the thriving of pre-trained language model (PLM), some pioneer works adopt PLM for pre-trained recommendation, where modality information (e.g., text) is considered un… ▽ More

    Submitted 7 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

  4. arXiv:2404.15704  [pdf, other

    cs.LG cs.AI cs.SD eess.AS

    Efficient Multi-Model Fusion with Adversarial Complementary Representation Learning

    Authors: Zuheng Kang, Yayun He, Jianzong Wang, Junqing Peng, **g Xiao

    Abstract: Single-model systems often suffer from deficiencies in tasks such as speaker verification (SV) and image classification, relying heavily on partial prior knowledge during decision-making, resulting in suboptimal performance. Although multi-model fusion (MMF) can mitigate some of these issues, redundancy in learned representations may limits improvements. To this end, we propose an adversarial comp… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)

  5. arXiv:2404.14721  [pdf, other

    cs.LG

    Dynamically Anchored Prompting for Task-Imbalanced Continual Learning

    Authors: Chenxing Hong, Yan **, Zhiqi Kang, Yizhou Chen, Mengke Li, Yang Lu, Hanzi Wang

    Abstract: Existing continual learning literature relies heavily on a strong assumption that tasks arrive with a balanced data stream, which is often unrealistic in real-world applications. In this work, we explore task-imbalanced continual learning (TICL) scenarios where the distribution of task data is non-uniform across the whole learning process. We find that imbalanced tasks significantly challenge the… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: Accepted by IJCAI 2024

  6. arXiv:2404.13892  [pdf, other

    cs.SD cs.AI eess.AS

    Retrieval-Augmented Audio Deepfake Detection

    Authors: Zuheng Kang, Yayun He, Botao Zhao, Xiaoyang Qu, Junqing Peng, **g Xiao, Jianzong Wang

    Abstract: With recent advances in speech synthesis including text-to-speech (TTS) and voice conversion (VC) systems enabling the generation of ultra-realistic audio deepfakes, there is growing concern about their potential misuse. However, most deepfake (DF) detection methods rely solely on the fuzzy knowledge learned by a single model, resulting in performance bottlenecks and transparency issues. Inspired… ▽ More

    Submitted 23 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: Accepted by the 2024 International Conference on Multimedia Retrieval (ICMR 2024)

  7. arXiv:2404.11375  [pdf, other

    cs.CV cs.MM

    Text-controlled Motion Mamba: Text-Instructed Temporal Grounding of Human Motion

    Authors: Xinghan Wang, Zixi Kang, Yadong Mu

    Abstract: Human motion understanding is a fundamental task with diverse practical applications, facilitated by the availability of large-scale motion capture datasets. Recent studies focus on text-motion tasks, such as text-based motion generation, editing and question answering. In this study, we introduce the novel task of text-based human motion grounding (THMG), aimed at precisely localizing temporal se… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  8. arXiv:2404.08796  [pdf, other

    cs.IR

    The Elephant in the Room: Rethinking the Usage of Pre-trained Language Model in Sequential Recommendation

    Authors: Zekai Qu, Ruobing Xie, Chaojun Xiao, Xingwu Sun, Zhanhui Kang

    Abstract: Sequential recommendation (SR) has seen significant advancements with the help of Pre-trained Language Models (PLMs). Some PLM-based SR models directly use PLM to encode user historical behavior's text sequences to learn user representations, while there is seldom an in-depth exploration of the capability and suitability of PLM in behavior sequence modeling. In this work, we first conduct extensiv… ▽ More

    Submitted 17 April, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

    Comments: 10 pages

  9. arXiv:2404.08793  [pdf, other

    cs.CR cs.CL cs.HC

    JailbreakLens: Visual Analysis of Jailbreak Attacks Against Large Language Models

    Authors: Yingchaojie Feng, Zhizhang Chen, Zhining Kang, Sijia Wang, Minfeng Zhu, Wei Zhang, Wei Chen

    Abstract: The proliferation of large language models (LLMs) has underscored concerns regarding their security vulnerabilities, notably against jailbreak attacks, where adversaries design jailbreak prompts to circumvent safety mechanisms for potential misuse. Addressing these concerns necessitates a comprehensive analysis of jailbreak prompts to evaluate LLMs' defensive capabilities and identify potential we… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: Submitted to VIS 2024

  10. arXiv:2403.11116  [pdf, other

    cs.CV cs.AI

    PhD: A Prompted Visual Hallucination Evaluation Dataset

    Authors: Jiazhen Liu, Yuhan Fu, Ruobing Xie, Runquan Xie, Xingwu Sun, Fengzong Lian, Zhanhui Kang, Xirong Li

    Abstract: The rapid growth of Large Language Models (LLMs) has driven the development of Large Vision-Language Models (LVLMs). The challenge of hallucination, prevalent in LLMs, also emerges in LVLMs. However, most existing efforts mainly focus on object hallucination in LVLM, ignoring diverse types of LVLM hallucinations. In this study, we delve into the Intrinsic Vision-Language Hallucination (IVL-Hallu)… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  11. arXiv:2403.03676  [pdf, other

    cs.LG

    Simplified PCNet with Robustness

    Authors: Bingheng Li, Xuanting Xie, Haoxiang Lei, Ruiyi Fang, Zhao Kang

    Abstract: Graph Neural Networks (GNNs) have garnered significant attention for their success in learning the representation of homophilic or heterophilic graphs. However, they cannot generalize well to real-world graphs with different levels of homophily. In response, the Possion-Charlier Network (PCNet) \cite{li2024pc}, the previous work, allows graph representation to be learned from heterophily to homoph… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: 10 pages, 3 figures

  12. arXiv:2403.03670  [pdf, other

    cs.LG

    CDC: A Simple Framework for Complex Data Clustering

    Authors: Zhao Kang, Xuanting Xie, Bingheng Li, Erlin Pan

    Abstract: In today's data-driven digital era, the amount as well as complexity, such as multi-view, non-Euclidean, and multi-relational, of the collected data are growing exponentially or even faster. Clustering, which unsupervisely extracts valid knowledge from data, is extremely useful in practice. However, existing methods are independently developed to handle one particular challenge at the expense of t… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: 10 pages, 5 figures

  13. arXiv:2403.03666  [pdf, other

    cs.LG

    Provable Filter for Real-world Graph Clustering

    Authors: Xuanting Xie, Erlin Pan, Zhao Kang, Wenyu Chen, Bingheng Li

    Abstract: Graph clustering, an important unsupervised problem, has been shown to be more resistant to advances in Graph Neural Networks (GNNs). In addition, almost all clustering methods focus on homophilic graphs and ignore heterophily. This significantly limits their applicability in practice, since real-world graphs exhibit a structural disparity and cannot simply be classified as homophily and heterophi… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: 12 pages, 5 figures

  14. arXiv:2403.03659  [pdf, other

    cs.LG

    Robust Graph Structure Learning under Heterophily

    Authors: Xuanting Xie, Zhao Kang, Wenyu Chen

    Abstract: Graph is a fundamental mathematical structure in characterizing relations between different objects and has been widely used on various learning tasks. Most methods implicitly assume a given graph to be accurate and complete. However, real data is inevitably noisy and sparse, which will lead to inferior results. Despite the remarkable success of recent graph representation learning methods, they i… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: 26 pages, 5 figures

  15. arXiv:2403.02775  [pdf, other

    cs.AI cs.LG

    EasyQuant: An Efficient Data-free Quantization Algorithm for LLMs

    Authors: Hanlin Tang, Yifu Sun, Decheng Wu, Kai Liu, Jianchen Zhu, Zhanhui Kang

    Abstract: Large language models (LLMs) have proven to be very superior to conventional methods in various tasks. However, their expensive computations and high memory requirements are prohibitive for deployment. Model quantization is an effective method for reducing this overhead. The problem is that in most previous works, the quantized model was calibrated using few samples from the training data, which m… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  16. arXiv:2403.01886  [pdf, other

    cs.CL cs.AI

    FCDS: Fusing Constituency and Dependency Syntax into Document-Level Relation Extraction

    Authors: Xudong Zhu, Zhao Kang, Bei Hui

    Abstract: Document-level Relation Extraction (DocRE) aims to identify relation labels between entities within a single document. It requires handling several sentences and reasoning over them. State-of-the-art DocRE methods use a graph structure to connect entities across the document to capture dependency syntax information. However, this is insufficient to fully exploit the rich syntax information in the… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: Appear in COLING 2024

  17. arXiv:2402.18581  [pdf, other

    cs.NE cs.AI

    Multi-objective Optimal Roadside Units Deployment in Urban Vehicular Networks

    Authors: Weian Guo, Zecheng Kang, Dongyang Li, Lun Zhang, Li Li

    Abstract: The significance of transportation efficiency, safety, and related services is increasing in urban vehicular networks. Within such networks, roadside units (RSUs) serve as intermediates in facilitating communication. Therefore, the deployment of RSUs is of utmost importance in ensuring the quality of communication services. However, the optimization objectives, such as time delay and deployment co… ▽ More

    Submitted 14 January, 2024; originally announced February 2024.

    Comments: This manuscript has been submitted to the journal of IEEE Transactions on Vehicular Technology

  18. arXiv:2402.13607  [pdf, other

    cs.CV cs.CL

    CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models

    Authors: Fuwen Luo, Chi Chen, Zihao Wan, Zhaolu Kang, Qidong Yan, Yingjie Li, Xiaolong Wang, Siyu Wang, Ziyue Wang, Xiaoyue Mi, Peng Li, Ning Ma, Maosong Sun, Yang Liu

    Abstract: Multimodal large language models (MLLMs) have demonstrated promising results in a variety of tasks that combine vision and language. As these models become more integral to research and applications, conducting comprehensive evaluations of their capabilities has grown increasingly important. However, most existing benchmarks fail to consider that, in certain situations, images need to be interpret… ▽ More

    Submitted 4 June, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

  19. arXiv:2402.04883  [pdf, other

    cs.CV

    Toward Accurate Camera-based 3D Object Detection via Cascade Depth Estimation and Calibration

    Authors: Chaoqun Wang, Yiran Qin, Zijian Kang, Ningning Ma, Ruimao Zhang

    Abstract: Recent camera-based 3D object detection is limited by the precision of transforming from image to 3D feature spaces, as well as the accuracy of object localization within the 3D space. This paper aims to address such a fundamental problem of camera-based 3D object detection: How to effectively learn depth information for accurate feature lifting and object localization. Different from previous met… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: Accepted to ICRA2024

  20. arXiv:2402.01516  [pdf, other

    cs.CV

    Cross-view Masked Diffusion Transformers for Person Image Synthesis

    Authors: Trung X. Pham, Zhang Kang, Chang D. Yoo

    Abstract: We present X-MDPT ($\underline{Cross}$-view $\underline{M}$asked $\underline{D}$iffusion $\underline{P}$rediction $\underline{T}$ransformers), a novel diffusion model designed for pose-guided human image generation. X-MDPT distinguishes itself by employing masked diffusion transformers that operate on latent patches, a departure from the commonly-used Unet structures in existing works. The model c… ▽ More

    Submitted 3 June, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

    Comments: ICML 2024

  21. arXiv:2401.02913  [pdf, other

    cs.IR

    Plug-in Diffusion Model for Sequential Recommendation

    Authors: Haokai Ma, Ruobing Xie, Lei Meng, Xin Chen, Xu Zhang, Leyu Lin, Zhanhui Kang

    Abstract: Pioneering efforts have verified the effectiveness of the diffusion models in exploring the informative uncertainty for recommendation. Considering the difference between recommendation and image synthesis tasks, existing methods have undertaken tailored refinements to the diffusion and reverse process. However, these approaches typically use the highest-score item in corpus for user interest pred… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

    Comments: Accepted by AAAI 2024

  22. arXiv:2312.17484  [pdf, other

    cs.CL cs.AI

    Truth Forest: Toward Multi-Scale Truthfulness in Large Language Models through Intervention without Tuning

    Authors: Zhongzhi Chen, Xingwu Sun, Xianfeng Jiao, Fengzong Lian, Zhanhui Kang, Di Wang, Cheng-Zhong Xu

    Abstract: Despite the great success of large language models (LLMs) in various tasks, they suffer from generating hallucinations. We introduce Truth Forest, a method that enhances truthfulness in LLMs by uncovering hidden truth representations using multi-dimensional orthogonal probes. Specifically, it creates multiple orthogonal bases for modeling truth by incorporating orthogonal constraints into the prob… ▽ More

    Submitted 14 January, 2024; v1 submitted 29 December, 2023; originally announced December 2023.

    Comments: Accepted as AAAI 2024

  23. arXiv:2312.14438  [pdf, other

    cs.LG cs.AI cs.SI

    PC-Conv: Unifying Homophily and Heterophily with Two-fold Filtering

    Authors: Bingheng Li, Erlin Pan, Zhao Kang

    Abstract: Recently, many carefully crafted graph representation learning methods have achieved impressive performance on either strong heterophilic or homophilic graphs, but not both. Therefore, they are incapable of generalizing well across real-world graphs with different levels of homophily. This is attributed to their neglect of homophily in heterophilic graphs, and vice versa. In this paper, we propose… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI2024

  24. arXiv:2312.14066  [pdf, other

    cs.LG

    Upper Bounding Barlow Twins: A Novel Filter for Multi-Relational Clustering

    Authors: Xiaowei Qian, Bingheng Li, Zhao Kang

    Abstract: Multi-relational clustering is a challenging task due to the fact that diverse semantic information conveyed in multi-layer graphs is difficult to extract and fuse. Recent methods integrate topology structure and node attribute information through graph filtering. However, they often use a low-pass filter without fully considering the correlation among multiple graphs. To overcome this drawback, w… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI 2024

  25. arXiv:2311.01033  [pdf, other

    cs.LG cs.AI cs.SI

    Non-Autoregressive Diffusion-based Temporal Point Processes for Continuous-Time Long-Term Event Prediction

    Authors: Wang-Tao Zhou, Zhao Kang, Ling Tian

    Abstract: Continuous-time long-term event prediction plays an important role in many application scenarios. Most existing works rely on autoregressive frameworks to predict event sequences, which suffer from error accumulation, thus compromising prediction quality. Inspired by the success of denoising diffusion probabilistic models, we propose a diffusion-based non-autoregressive temporal point process mode… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

  26. arXiv:2310.15929  [pdf, other

    cs.LG cs.AI cs.CL

    E-Sparse: Boosting the Large Language Model Inference through Entropy-based N:M Sparsity

    Authors: Yun Li, Lin Niu, Xipeng Zhang, Kai Liu, Jianchen Zhu, Zhanhui Kang

    Abstract: Traditional pruning methods are known to be challenging to work in Large Language Models (LLMs) for Generative AI because of their unaffordable training process and large computational demands. For the first time, we introduce the information entropy of hidden state features into a pruning metric design, namely E-Sparse, to improve the accuracy of N:M sparsity on LLM. E-Sparse employs the informat… ▽ More

    Submitted 22 March, 2024; v1 submitted 24 October, 2023; originally announced October 2023.

  27. arXiv:2310.13540  [pdf, other

    cs.IR

    Thoroughly Modeling Multi-domain Pre-trained Recommendation as Language

    Authors: Zekai Qu, Ruobing Xie, Chaojun Xiao, Yuan Yao, Zhiyuan Liu, Fengzong Lian, Zhanhui Kang, Jie Zhou

    Abstract: With the thriving of pre-trained language model (PLM) widely verified in various of NLP tasks, pioneer efforts attempt to explore the possible cooperation of the general textual information in PLM with the personalized behavioral information in user historical behavior sequences to enhance sequential recommendation (SR). However, despite the commonalities of input format and task goal, there are h… ▽ More

    Submitted 27 November, 2023; v1 submitted 20 October, 2023; originally announced October 2023.

  28. arXiv:2310.04681  [pdf, other

    cs.SD cs.AI eess.AS

    VoiceExtender: Short-utterance Text-independent Speaker Verification with Guided Diffusion Model

    Authors: Yayun He, Zuheng Kang, Jianzong Wang, Junqing Peng, **g Xiao

    Abstract: Speaker verification (SV) performance deteriorates as utterances become shorter. To this end, we propose a new architecture called VoiceExtender which provides a promising solution for improving SV performance when handling short-duration speech signals. We use two guided diffusion models, the built-in and the external speaker embedding (SE) guided diffusion model, both of which utilize a diffusio… ▽ More

    Submitted 6 October, 2023; originally announced October 2023.

    Comments: Accepted by the 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2023)

  29. A Prototype-Based Neural Network for Image Anomaly Detection and Localization

    Authors: Chao Huang, Zhao Kang, Hong Wu

    Abstract: Image anomaly detection and localization perform not only image-level anomaly classification but also locate pixel-level anomaly regions. Recently, it has received much research attention due to its wide application in various fields. This paper proposes ProtoAD, a prototype-based neural network for image anomaly detection and localization. First, the patch features of normal images are extracted… ▽ More

    Submitted 25 May, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: Published in Neural Processing Letters 2024

    Journal ref: Neural Process Lett 56, 169 (2024)

  30. arXiv:2309.07084  [pdf, other

    cs.CV

    SupFusion: Supervised LiDAR-Camera Fusion for 3D Object Detection

    Authors: Yiran Qin, Chaoqun Wang, Zijian Kang, Ningning Ma, Zhen Li, Ruimao Zhang

    Abstract: In this paper, we propose a novel training strategy called SupFusion, which provides an auxiliary feature level supervision for effective LiDAR-Camera fusion and significantly boosts detection performance. Our strategy involves a data enhancement method named Polar Sampling, which densifies sparse objects and trains an assistant model to generate high-quality features as the supervision. These fea… ▽ More

    Submitted 31 October, 2023; v1 submitted 13 September, 2023; originally announced September 2023.

    Comments: Accepted to ICCV2023

  31. arXiv:2308.01217  [pdf, other

    cs.CV

    TeachCLIP: Multi-Grained Teaching for Efficient Text-to-Video Retrieval

    Authors: Kaibin Tian, Ruixiang Zhao, Hu Hu, Runquan Xie, Fengzong Lian, Zhanhui Kang, Xirong Li

    Abstract: For text-to-video retrieval (T2VR), which aims to retrieve unlabeled videos by ad-hoc textual queries, CLIP-based methods are dominating. Compared to CLIP4Clip which is efficient and compact, the state-of-the-art models tend to compute video-text similarity by fine-grained cross-modal feature interaction and matching, putting their scalability for large-scale T2VR into doubt. For efficient T2VR, w… ▽ More

    Submitted 2 August, 2023; originally announced August 2023.

  32. arXiv:2307.12286  [pdf, ps, other

    cs.IT eess.SP

    Double-Active-IRS Aided Wireless Communication: Deployment Optimization and Capacity Scaling

    Authors: Zhenyu Kang, Changsheng You, Rui Zhang

    Abstract: In this letter, we consider a double-active-intelligent reflecting surface (IRS) aided wireless communication system, where two active IRSs are properly deployed to assist the communication from a base station (BS) to multiple users located in a given zone via the double-reflection links. Under the assumption of fixed per-element amplification power for each active-IRS element, we formulate a rate… ▽ More

    Submitted 23 July, 2023; originally announced July 2023.

  33. arXiv:2306.14072  [pdf, other

    cs.LG cs.AI cs.SI

    Intensity-free Convolutional Temporal Point Process: Incorporating Local and Global Event Contexts

    Authors: Wang-Tao Zhou, Zhao Kang, Ling Tian, Yi Su

    Abstract: Event prediction in the continuous-time domain is a crucial but rather difficult task. Temporal point process (TPP) learning models have shown great advantages in this area. Existing models mainly focus on encoding global contexts of events using techniques like recurrent neural networks (RNNs) or self-attention mechanisms. However, local event contexts also play an important role in the occurrenc… ▽ More

    Submitted 24 June, 2023; originally announced June 2023.

    Comments: Accepted to Information Sciences

  34. arXiv:2305.19581  [pdf, other

    cs.SD cs.AI eess.AS

    SVVAD: Personal Voice Activity Detection for Speaker Verification

    Authors: Zuheng Kang, Jianzong Wang, Junqing Peng, **g Xiao

    Abstract: Voice activity detection (VAD) improves the performance of speaker verification (SV) by preserving speech segments and attenuating the effects of non-speech. However, this scheme is not ideal: (1) it fails in noisy environments or multi-speaker conversations; (2) it is trained based on inaccurate non-SV sensitive labels. To address this, we propose a speaker verification-based voice activity detec… ▽ More

    Submitted 31 May, 2023; originally announced May 2023.

    Comments: Accepted by INTERSPEECH 2023

  35. ChinaOpen: A Dataset for Open-world Multimodal Learning

    Authors: Aozhu Chen, Ziyuan Wang, Chengbo Dong, Kaibin Tian, Ruixiang Zhao, Xun Liang, Zhanhui Kang, Xirong Li

    Abstract: This paper introduces ChinaOpen, a dataset sourced from Bilibili, a popular Chinese video-sharing website, for open-world multimodal learning. While the state-of-the-art multimodal learning networks have shown impressive performance in automated video annotation and cross-modal video retrieval, their training and evaluation are primarily conducted on YouTube videos with English text. Their effecti… ▽ More

    Submitted 6 August, 2023; v1 submitted 10 May, 2023; originally announced May 2023.

    Comments: Accepted by ACMMM 2023

  36. arXiv:2305.03614  [pdf, other

    cs.CV

    Denoising-Diffusion Alignment for Continuous Sign Language Recognition

    Authors: Leming Guo, Wanli Xue, Yuxi Zhou, Ze Kang, Tiantian Yuan, Zan Gao, Shengyong Chen

    Abstract: Continuous sign language recognition (CSLR) aims to promote active and accessible communication for the hearing impaired, by recognizing signs in untrimmed sign language videos to textual glosses sequentially. The key challenge of CSLR is how to achieve the cross-modality alignment between videos and gloss sequences. However, the current cross-modality paradigms of CSLR overlook using the glosses… ▽ More

    Submitted 3 May, 2024; v1 submitted 5 May, 2023; originally announced May 2023.

  37. arXiv:2305.02931  [pdf, other

    cs.SI cs.AI cs.LG

    Beyond Homophily: Reconstructing Structure for Graph-agnostic Clustering

    Authors: Erlin Pan, Zhao Kang

    Abstract: Graph neural networks (GNNs) based methods have achieved impressive performance on node clustering task. However, they are designed on the homophilic assumption of graph and clustering on heterophilic graph is overlooked. Due to the lack of labels, it is impossible to first identify a graph as homophilic or heterophilic before a suitable GNN model can be found. Hence, clustering on real-world grap… ▽ More

    Submitted 2 May, 2023; originally announced May 2023.

    Comments: Accepted by ICML 2023

  38. arXiv:2304.09421  [pdf, other

    cs.CL cs.CV cs.LG cs.SI

    TieFake: Title-Text Similarity and Emotion-Aware Fake News Detection

    Authors: Quanjiang Guo, Zhao Kang, Ling Tian, Zhouguo Chen

    Abstract: Fake news detection aims to detect fake news widely spreading on social media platforms, which can negatively influence the public and the government. Many approaches have been developed to exploit relevant information from news images, text, or videos. However, these methods may suffer from the following limitations: (1) ignore the inherent emotional information of the news, which could be benefi… ▽ More

    Submitted 19 April, 2023; originally announced April 2023.

    Comments: Appear on IJCNN 2023

  39. arXiv:2303.07643  [pdf, other

    cs.SD cs.AI eess.AS

    Feature-Rich Audio Model Inversion for Data-Free Knowledge Distillation Towards General Sound Classification

    Authors: Zuheng Kang, Yayun He, Jianzong Wang, Junqing Peng, Xiaoyang Qu, **g Xiao

    Abstract: Data-Free Knowledge Distillation (DFKD) has recently attracted growing attention in the academic community, especially with major breakthroughs in computer vision. Despite promising results, the technique has not been well applied to audio and signal processing. Due to the variable duration of audio signals, it has its own unique way of modeling. In this work, we propose feature-rich audio model i… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

    Comments: Accepted by ICASSP 2023. International Conference on Acoustics, Speech and Signal Processing (ICASSP 2023)

  40. arXiv:2303.06879  [pdf, other

    cs.LG cs.CV

    Spacecraft Anomaly Detection with Attention Temporal Convolution Network

    Authors: Liang Liu, Ling Tian, Zhao Kang, Tianqi Wan

    Abstract: Spacecraft faces various situations when carrying out exploration missions in complex space, thus monitoring the anomaly status of spacecraft is crucial to the development of \textcolor{blue}{the} aerospace industry. The time series telemetry data generated by on-orbit spacecraft \textcolor{blue}{contains} important information about the status of spacecraft. However, traditional domain knowledge-… ▽ More

    Submitted 13 March, 2023; originally announced March 2023.

  41. arXiv:2303.03912  [pdf, other

    cs.CL cs.AI cs.LG cs.SI

    Document-level Relation Extraction with Cross-sentence Reasoning Graph

    Authors: Hongfei Liu, Zhao Kang, Lizong Zhang, Ling Tian, Fujun Hua

    Abstract: Relation extraction (RE) has recently moved from the sentence-level to document-level, which requires aggregating document information and using entities and mentions for reasoning. Existing works put entity nodes and mention nodes with similar representations in a document-level graph, whose complex edges may incur redundant information. Furthermore, existing studies only focus on entity-level re… ▽ More

    Submitted 7 March, 2023; originally announced March 2023.

    Comments: This paper is accepted by PAKDD 2023

  42. arXiv:2302.13756  [pdf, other

    cs.IR

    Multi-Feature Integration for Perception-Dependent Examination-Bias Estimation

    Authors: Xiaoshu Chen, Xiangsheng Li, Kunliang Wei, Bin Hu, Lei Jiang, Zeqian Huang, Zhanhui Kang

    Abstract: Eliminating examination bias accurately is pivotal to apply click-through data to train an unbiased ranking model. However, most examination-bias estimators are limited to the hypothesis of Position-Based Model (PBM), which supposes that the calculation of examination bias only depends on the rank of the document. Recently, although some works introduce information such as clicks in the same query… ▽ More

    Submitted 27 February, 2023; originally announced February 2023.

  43. arXiv:2302.13498  [pdf, other

    cs.IR

    Pretraining De-Biased Language Model with Large-scale Click Logs for Document Ranking

    Authors: Xiangsheng Li, Xiaoshu Chen, Kunliang Wei, Bin Hu, Lei Jiang, Zeqian Huang, Zhanhui Kang

    Abstract: Pre-trained language models have achieved great success in various large-scale information retrieval tasks. However, most of pretraining tasks are based on counterfeit retrieval data where the query produced by the tailored rule is assumed as the user's issued query on the given document or passage. Therefore, we explore to use large-scale click logs to pretrain a language model instead of replyin… ▽ More

    Submitted 26 February, 2023; originally announced February 2023.

  44. arXiv:2302.09146  [pdf, other

    cs.SE

    DMSConfig: Automated Configuration Tuning for Distributed IoT Message Systems Using Deep Reinforcement Learning

    Authors: Zhuangwei Kang, Yogesh D. Barve, Shunxing Bao, Abhishek Dubey, Aniruddha Gokhale

    Abstract: The Distributed Messaging Systems (DMSs) used in IoT systems require timely and reliable data dissemination, which can be achieved through configurable parameters. However, the high-dimensional configuration space makes it difficult for users to find the best options that maximize application throughput while meeting specific latency constraints. Existing approaches to automatic software profiling… ▽ More

    Submitted 17 February, 2023; originally announced February 2023.

  45. arXiv:2302.00123  [pdf, other

    cs.CV cs.AI

    Design and Implementation of A Soccer Ball Detection System with Multiple Cameras

    Authors: Lei Li, Tianfang Zhang, Zhongfeng Kang, Wenhan Zhang

    Abstract: The detection of small and medium-sized objects in three dimensions has always been a frontier exploration problem. This technology has a very wide application in sports analysis, games, virtual reality, human animation and other fields. The traditional three-dimensional small target detection technology has the disadvantages of high cost, low precision and inconvenience, so it is difficult to app… ▽ More

    Submitted 31 January, 2023; originally announced February 2023.

    Comments: 89 pages

  46. arXiv:2301.04311  [pdf, other

    cs.IT eess.SP

    Active-IRS-Aided Wireless Communication: Fundamentals, Designs and Open Issues

    Authors: Zhenyu Kang, Changsheng You, Rui Zhang

    Abstract: Intelligent reflecting surface (IRS) has emerged as a promising technology to realize smart radio environment for future wireless communication systems. Existing works in this line of research have mainly considered the conventional passive IRS that reflects wireless signals without power amplification, while in this article, we give an overview of a new type of IRS, called active IRS, which enabl… ▽ More

    Submitted 25 June, 2023; v1 submitted 11 January, 2023; originally announced January 2023.

  47. arXiv:2212.14322  [pdf, other

    cs.IR cs.AI cs.MM

    BagFormer: Better Cross-Modal Retrieval via bag-wise interaction

    Authors: Haowen Hou, Xiaopeng Yan, Yigeng Zhang, Fengzong Lian, Zhanhui Kang

    Abstract: In the field of cross-modal retrieval, single encoder models tend to perform better than dual encoder models, but they suffer from high latency and low throughput. In this paper, we present a dual encoder model called BagFormer that utilizes a cross modal interaction mechanism to improve recall performance without sacrificing latency and throughput. BagFormer achieves this through the use of bag-w… ▽ More

    Submitted 29 December, 2022; originally announced December 2022.

    Comments: 8 pages, 4 figures, 4 tables

  48. arXiv:2212.09098  [pdf, other

    cs.CV

    Mask-FPAN: Semi-Supervised Face Parsing in the Wild With De-Occlusion and UV GAN

    Authors: Lei Li, Tianfang Zhang, Zhongfeng Kang, Xikun Jiang

    Abstract: Fine-grained semantic segmentation of a person's face and head, including facial parts and head components, has progressed a great deal in recent years. However, it remains a challenging task, whereby considering ambiguous occlusions and large pose variations are particularly difficult. To overcome these difficulties, we propose a novel framework termed Mask-FPAN. It uses a de-occlusion module tha… ▽ More

    Submitted 30 May, 2023; v1 submitted 18 December, 2022; originally announced December 2022.

  49. arXiv:2212.06385  [pdf, other

    cs.CL

    TencentPretrain: A Scalable and Flexible Toolkit for Pre-training Models of Different Modalities

    Authors: Zhe Zhao, Yudong Li, Cheng Hou, **g Zhao, Rong Tian, Weijie Liu, Yiren Chen, Ningyuan Sun, Haoyan Liu, Weiquan Mao, Han Guo, Weigang Guo, Taiqiang Wu, Tao Zhu, Wenhang Shi, Chen Chen, Shan Huang, Sihong Chen, Liqun Liu, Feifei Li, Xiaoshuai Chen, Xingwu Sun, Zhanhui Kang, Xiaoyong Du, Linlin Shen , et al. (1 additional authors not shown)

    Abstract: Recently, the success of pre-training in text domain has been fully extended to vision, audio, and cross-modal scenarios. The proposed pre-training models of different modalities are showing a rising trend of homogeneity in their model structures, which brings the opportunity to implement different pre-training models within a uniform framework. In this paper, we present TencentPretrain, a toolkit… ▽ More

    Submitted 11 July, 2023; v1 submitted 13 December, 2022; originally announced December 2022.

  50. arXiv:2212.05102  [pdf, other

    cs.CV cs.LG

    A soft nearest-neighbor framework for continual semi-supervised learning

    Authors: Zhiqi Kang, Enrico Fini, Moin Nabi, Elisa Ricci, Karteek Alahari

    Abstract: Despite significant advances, the performance of state-of-the-art continual learning approaches hinges on the unrealistic scenario of fully labeled data. In this paper, we tackle this challenge and propose an approach for continual semi-supervised learning--a setting where not all the data samples are labeled. A primary issue in this scenario is the model forgetting representations of unlabeled da… ▽ More

    Submitted 11 September, 2023; v1 submitted 9 December, 2022; originally announced December 2022.

    Comments: Accepted at ICCV 2023