Skip to main content

Showing 1–50 of 400 results for author: Xiong, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01085  [pdf, other

    cs.LG cs.CL

    Rethinking LLM-based Preference Evaluation

    Authors: Zhengyu Hu, Linxin Song, Jieyu Zhang, Zheyuan Xiao, **gang Wang, Zhenyu Chen, Jieyu Zhao, Hui Xiong

    Abstract: Recently, large language model (LLM)-based preference evaluation has been widely adopted to compare pairs of model responses. However, a severe bias towards lengthy responses has been observed, raising concerns about the reliability of this evaluation method. In this work, we designed a series of controlled experiments to study the major impacting factors of the metric of LLM-based preference eval… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2407.00128  [pdf, other

    cs.IR cs.AI cs.LG

    When Search Engine Services meet Large Language Models: Visions and Challenges

    Authors: Haoyi Xiong, Jiang Bian, Yuchen Li, Xuhong Li, Mengnan Du, Shuaiqiang Wang, Dawei Yin, Sumi Helal

    Abstract: Combining Large Language Models (LLMs) with search engine services marks a significant shift in the field of services computing, opening up new possibilities to enhance how we search for and retrieve information, understand content, and interact with internet services. This paper conducts an in-depth examination of how integrating LLMs with search engines can mutually benefit both technologies. We… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

    Comments: Under Review

  3. arXiv:2406.18115  [pdf, other

    cs.RO cs.AI cs.CV

    Open-vocabulary Mobile Manipulation in Unseen Dynamic Environments with 3D Semantic Maps

    Authors: Dicong Qiu, Wenzong Ma, Zhenfu Pan, Hui Xiong, Junwei Liang

    Abstract: Open-Vocabulary Mobile Manipulation (OVMM) is a crucial capability for autonomous robots, especially when faced with the challenges posed by unknown and dynamic environments. This task requires robots to explore and build a semantic understanding of their surroundings, generate feasible plans to achieve manipulation goals, adapt to environmental changes, and comprehend natural language instruction… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Open-vocabulary, Mobile Manipulation, Dynamic Environments, 3D Semantic Maps, Zero-shot, LLMs, VLMs, 18 pages, 2 figures

  4. Joint Admission Control and Resource Allocation of Virtual Network Embedding via Hierarchical Deep Reinforcement Learning

    Authors: Tianfu Wang, Li Shen, Qilin Fan, Tong Xu, Tongliang Liu, Hui Xiong

    Abstract: As an essential resource management problem in network virtualization, virtual network embedding (VNE) aims to allocate the finite resources of physical network to sequentially arriving virtual network requests (VNRs) with different resource demands. Since this is an NP-hard combinatorial optimization problem, many efforts have been made to provide viable solutions. However, most existing approach… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Accepted by IEEE Transactions on Services Computing (TSC)

    Journal ref: IEEE Transactions on Services Computing ( Volume: 17, Issue: 3, May-June 2024)

  5. arXiv:2406.12923  [pdf, other

    cs.LG cs.MA

    Interpretable Cascading Mixture-of-Experts for Urban Traffic Congestion Prediction

    Authors: Wenzhao Jiang, **dong Han, Hao Liu, Tao Tao, Naiqiang Tan, Hui Xiong

    Abstract: Rapid urbanization has significantly escalated traffic congestion, underscoring the need for advanced congestion prediction services to bolster intelligent transportation systems. As one of the world's largest ride-hailing platforms, DiDi places great emphasis on the accuracy of congestion prediction to enhance the effectiveness and reliability of their real-time services, such as travel time esti… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  6. arXiv:2406.12355  [pdf, other

    cs.CV

    LiCAF: LiDAR-Camera Asymmetric Fusion for Gait Recognition

    Authors: Yunze Deng, Haijun Xiong, Bin Feng

    Abstract: Gait recognition is a biometric technology that identifies individuals by using walking patterns. Due to the significant achievements of multimodal fusion in gait recognition, we consider employing LiDAR-camera fusion to obtain robust gait representations. However, existing methods often overlook intrinsic characteristics of modalities, and lack fine-grained fusion and temporal modeling. In this p… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Accepted by ICIP2024

  7. arXiv:2406.11920  [pdf, other

    cs.LG cs.AI

    Job-SDF: A Multi-Granularity Dataset for Job Skill Demand Forecasting and Benchmarking

    Authors: Xi Chen, Chuan Qin, Chuyu Fang, Chao Wang, Chen Zhu, Fuzhen Zhuang, Hengshu Zhu, Hui Xiong

    Abstract: In a rapidly evolving job market, skill demand forecasting is crucial as it enables policymakers and businesses to anticipate and adapt to changes, ensuring that workforce skills align with market needs, thereby enhancing productivity and competitiveness. Additionally, by identifying emerging skill requirements, it directs individuals towards relevant training and education opportunities, promotin… ▽ More

    Submitted 19 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  8. arXiv:2406.11357  [pdf, other

    cs.CL cs.AI

    Refiner: Restructure Retrieval Content Efficiently to Advance Question-Answering Capabilities

    Authors: Zhonghao Li, Xuming Hu, Aiwei Liu, Kening Zheng, Sirui Huang, Hui Xiong

    Abstract: Large Language Models (LLMs) are limited by their parametric knowledge, leading to hallucinations in knowledge-extensive tasks. To address this, Retrieval-Augmented Generation (RAG) incorporates external document chunks to expand LLM knowledge. Furthermore, compressing information from document chunks through extraction or summarization can improve LLM performance. Nonetheless, LLMs still struggle… ▽ More

    Submitted 17 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: 8 pages

  9. arXiv:2406.10855  [pdf, other

    cs.CV cs.AI

    ALPS: An Auto-Labeling and Pre-training Scheme for Remote Sensing Segmentation With Segment Anything Model

    Authors: Song Zhang, Qingzhong Wang, Junyi Liu, Haoyi Xiong

    Abstract: In the fast-growing field of Remote Sensing (RS) image analysis, the gap between massive unlabeled datasets and the ability to fully utilize these datasets for advanced RS analytics presents a significant challenge. To fill the gap, our work introduces an innovative auto-labeling framework named ALPS (Automatic Labeling for Pre-training in Segmentation), leveraging the Segment Anything Model (SAM)… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  10. arXiv:2406.07413  [pdf, other

    cs.LG

    Holistic Memory Diversification for Incremental Learning in Growing Graphs

    Authors: Ziyue Qiao, Junren Xiao, Qingqiang Sun, Meng Xiao, Hui Xiong

    Abstract: This paper addresses the challenge of incremental learning in growing graphs with increasingly complex tasks. The goal is to continually train a graph model to handle new tasks while retaining its inference ability on previous tasks. Existing methods usually neglect the importance of memory diversity, limiting in effectively selecting high-quality memory from previous tasks and remembering broad p… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  11. arXiv:2406.05504  [pdf, other

    cs.LG

    G-Transformer: Counterfactual Outcome Prediction under Dynamic and Time-varying Treatment Regimes

    Authors: Hong Xiong, Feng Wu, Leon Deng, Megan Su, Li-wei H Lehman

    Abstract: In the context of medical decision making, counterfactual prediction enables clinicians to predict treatment outcomes of interest under alternative courses of therapeutic actions given observed patient history. Prior machine learning approaches for counterfactual predictions under time-varying treatments focus on static time-varying treatment regimes where treatments do not depend on previous cova… ▽ More

    Submitted 27 June, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

  12. arXiv:2406.01512  [pdf, other

    cs.CL

    MAD: Multi-Alignment MEG-to-Text Decoding

    Authors: Yiqian Yang, Hyejeong Jo, Yiqun Duan, Qiang Zhang, **ni Zhou, Won Hee Lee, Ren**g Xu, Hui Xiong

    Abstract: Deciphering language from brain activity is a crucial task in brain-computer interface (BCI) research. Non-invasive cerebral signaling techniques including electroencephalography (EEG) and magnetoencephalography (MEG) are becoming increasingly popular due to their safety and practicality, avoiding invasive electrode implantation. However, current works under-investigated three points: 1) a predomi… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  13. arXiv:2405.20291  [pdf, other

    cs.CR cs.CV cs.LG

    Unveiling and Mitigating Backdoor Vulnerabilities based on Unlearning Weight Changes and Backdoor Activeness

    Authors: Weilin Lin, Li Liu, Shaokui Wei, Jianze Li, Hui Xiong

    Abstract: The security threat of backdoor attacks is a central concern for deep neural networks (DNNs). Recently, without poisoned data, unlearning models with clean data and then learning a pruning mask have contributed to backdoor defense. Additionally, vanilla fine-tuning with those clean data can help recover the lost clean accuracy. However, the behavior of clean unlearning is still under-explored, and… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  14. arXiv:2405.15154  [pdf, other

    cs.AI cs.LG

    Online Prompt Pricing based on Combinatorial Multi-Armed Bandit and Hierarchical Stackelberg Game

    Authors: Meiling Li, Hongrun Ren, Haixu Xiong, Zhenxing Qian, Xinpeng Zhang

    Abstract: Generation models have shown promising performance in various tasks, making trading around machine learning models possible. In this paper, we aim at a novel prompt trading scenario, prompt bundle trading (PBT) system, and propose an online pricing mechanism. Based on the combinatorial multi-armed bandit (CMAB) and three-stage hierarchical Stackelburg (HS) game, our pricing mechanism considers the… ▽ More

    Submitted 31 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  15. arXiv:2405.14398  [pdf, other

    cs.HC cs.AI eess.SP

    SpGesture: Source-Free Domain-adaptive sEMG-based Gesture Recognition with Jaccard Attentive Spiking Neural Network

    Authors: Weiyu Guo, Ying Sun, Yijie Xu, Ziyue Qiao, Yongkui Yang, Hui Xiong

    Abstract: Surface electromyography (sEMG) based gesture recognition offers a natural and intuitive interaction modality for wearable devices. Despite significant advancements in sEMG-based gesture-recognition models, existing methods often suffer from high computational latency and increased energy consumption. Additionally, the inherent instability of sEMG signals, combined with their sensitivity to distri… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  16. arXiv:2405.14312  [pdf, other

    cs.CV cs.CL cs.MM

    Improving Gloss-free Sign Language Translation by Reducing Representation Density

    Authors: **hui Ye, Xing Wang, Wenxiang Jiao, Junwei Liang, Hui Xiong

    Abstract: Gloss-free sign language translation (SLT) aims to develop well-performing SLT systems with no requirement for the costly gloss annotations, but currently still lags behind gloss-based approaches significantly. In this paper, we identify a representation density problem that could be a bottleneck in restricting the performance of gloss-free SLT. Specifically, the representation density problem des… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: Representation Density and Performance Drop

  17. arXiv:2405.14279  [pdf, other

    cs.GT econ.TH

    Optimized Cost Per Click in Online Advertising: A Theoretical Analysis

    Authors: Kaichen Zhang, Zixuan Yuan, Hui Xiong

    Abstract: In recent years, Optimized Cost Per Click (OCPC) and Optimized Cost Per Mille (OCPM) have emerged as the most widely adopted pricing models in the online advertising industry. However, the existing literature has yet to identify the specific conditions under which these models outperform traditional pricing models like Cost Per Click (CPC) and Cost Per Action (CPA). To fill the gap, this paper bui… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: Accepted by SIGKDD2024 Research Track

  18. arXiv:2405.13389  [pdf, other

    cs.CV cs.MM cs.RO

    HR-INR: Continuous Space-Time Video Super-Resolution via Event Camera

    Authors: Yunfan Lu, Zipeng Wang, Yusheng Wang, Hui Xiong

    Abstract: Continuous space-time video super-resolution (C-STVSR) aims to simultaneously enhance video resolution and frame rate at an arbitrary scale. Recently, implicit neural representation (INR) has been applied to video restoration, representing videos as implicit fields that can be decoded at an arbitrary scale. However, the highly ill-posed nature of C-STVSR limits the effectiveness of current INR-bas… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 30 pages, 20 figures, 8 tables. This work was submitted for review in the second half of 2023. Project page: https://github.com/yunfanLu/HR-INR

  19. arXiv:2405.12821  [pdf, other

    cs.RO cs.CV

    Talk2Radar: Bridging Natural Language with 4D mmWave Radar for 3D Referring Expression Comprehension

    Authors: Runwei Guan, Ruixiao Zhang, Ningwei Ouyang, Jianan Liu, Ka Lok Man, Xiaohao Cai, Ming Xu, Jeremy Smith, Eng Gee Lim, Yutao Yue, Hui Xiong

    Abstract: Embodied perception is essential for intelligent vehicles and robots, enabling more natural interaction and task execution. However, these advancements currently embrace vision level, rarely focusing on using 3D modeling sensors, which limits the full understanding of surrounding objects with multi-granular characteristics. Recently, as a promising automotive sensor with affordable cost, 4D Millim… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: 8 pages, 5 figures

  20. arXiv:2405.10640  [pdf, other

    cs.SI

    COMET: NFT Price Prediction with Wallet Profiling

    Authors: Tianfu Wang, Liwei Deng, Chao Wang, Jianxun Lian, Yue Yan, Nicholas **g Yuan, Qi Zhang, Hui Xiong

    Abstract: As the non-fungible token (NFT) market flourishes, price prediction emerges as a pivotal direction for investors gaining valuable insight to maximize returns. However, existing works suffer from a lack of practical definitions and standardized evaluations, limiting their practical application. Moreover, the influence of users' multi-behaviour transactions that are publicly accessible on NFT price… ▽ More

    Submitted 29 May, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

    Comments: Accepted by KDD 2024 (ADS Track)

  21. arXiv:2405.07991  [pdf, other

    cs.RO cs.AI cs.CV cs.LG eess.SY

    SPIN: Simultaneous Perception, Interaction and Navigation

    Authors: Shagun Uppal, Ananye Agarwal, Haoyu Xiong, Kenneth Shaw, Deepak Pathak

    Abstract: While there has been remarkable progress recently in the fields of manipulation and locomotion, mobile manipulation remains a long-standing challenge. Compared to locomotion or static manipulation, a mobile system must make a diverse range of long-horizon tasks feasible in unstructured and dynamic environments. While the applications are broad and interesting, there are a plethora of challenges in… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: In CVPR 2024. Website at https://spin-robot.github.io/

  22. arXiv:2405.06459  [pdf, other

    cs.CL cs.AI

    Are EEG-to-Text Models Working?

    Authors: Hyejeong Jo, Yiqian Yang, Juhyeok Han, Yiqun Duan, Hui Xiong, Won Hee Lee

    Abstract: This work critically analyzes existing models for open-vocabulary EEG-to-Text translation. We identify a crucial limitation: previous studies often employed implicit teacher-forcing during evaluation, artificially inflating performance metrics. Additionally, they lacked a critical benchmark - comparing model performance on pure noise inputs. We propose a methodology to differentiate between models… ▽ More

    Submitted 13 June, 2024; v1 submitted 10 May, 2024; originally announced May 2024.

  23. arXiv:2405.04867  [pdf, other

    eess.IV cs.CV

    MIPI 2024 Challenge on Demosaic for HybridEVS Camera: Methods and Results

    Authors: Yaqi Wu, Zhihao Fan, Xiaofeng Chu, Jimmy S. Ren, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangcheng Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Senyan Xu, Zhi**g Sun, Jiaying Zhu, Yurui Zhu, Xueyang Fu, Zheng-Jun Zha, Jun Cao, Cheng Li, Shu Chen, Liang Ma, Shiyang Zhou, Hai** Zeng, Kai Feng , et al. (24 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: MIPI@CVPR2024. Website: https://mipi-challenge.org/MIPI2024/

  24. arXiv:2405.04103  [pdf, other

    cs.CV

    COM3D: Leveraging Cross-View Correspondence and Cross-Modal Mining for 3D Retrieval

    Authors: Hao Wu, Ruochong LI, Hao Wang, Hui Xiong

    Abstract: In this paper, we investigate an open research task of cross-modal retrieval between 3D shapes and textual descriptions. Previous approaches mainly rely on point cloud encoders for feature extraction, which may ignore key inherent features of 3D shapes, including depth, spatial hierarchy, geometric continuity, etc. To address this issue, we propose COM3D, making the first attempt to exploit the cr… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Accepted by ICME 2024 oral

  25. arXiv:2404.14642  [pdf, other

    cs.LG

    Uncertainty Quantification on Graph Learning: A Survey

    Authors: Chao Chen, Chenghua Guo, Rui Xu, Xiangwen Liao, Xi Zhang, Sihong Xie, Hui Xiong, Philip Yu

    Abstract: Graphical models, including Graph Neural Networks (GNNs) and Probabilistic Graphical Models (PGMs), have demonstrated their exceptional capabilities across numerous fields. These models necessitate effective uncertainty quantification to ensure reliable decision-making amid the challenges posed by model training discrepancies and unpredictable testing scenarios. This survey examines recent works t… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  26. arXiv:2404.13067  [pdf, other

    cs.CL cs.AI cs.LG

    Towards Efficient Resume Understanding: A Multi-Granularity Multi-Modal Pre-Training Approach

    Authors: Feihu Jiang, Chuan Qin, **gshuai Zhang, Kaichun Yao, Xi Chen, Dazhong Shen, Chen Zhu, Hengshu Zhu, Hui Xiong

    Abstract: In the contemporary era of widespread online recruitment, resume understanding has been widely acknowledged as a fundamental and crucial task, which aims to extract structured information from resume documents automatically. Compared to the traditional rule-based approaches, the utilization of recently proposed pre-trained document understanding models can greatly enhance the effectiveness of resu… ▽ More

    Submitted 13 April, 2024; originally announced April 2024.

    Comments: ICME 2024 Accepted

  27. arXiv:2404.12633  [pdf, other

    cs.AI cs.NI

    FlagVNE: A Flexible and Generalizable Reinforcement Learning Framework for Network Resource Allocation

    Authors: Tianfu Wang, Qilin Fan, Chao Wang, Long Yang, Leilei Ding, Nicholas **g Yuan, Hui Xiong

    Abstract: Virtual network embedding (VNE) is an essential resource allocation task in network virtualization, aiming to map virtual network requests (VNRs) onto physical infrastructure. Reinforcement learning (RL) has recently emerged as a promising solution to this problem. However, existing RL-based VNE methods are limited by the unidirectional action design and one-size-fits-all training strategy, result… ▽ More

    Submitted 1 May, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

    Comments: Accepted by IJCAI 2024

  28. arXiv:2404.11213  [pdf, other

    eess.SP cs.AI

    Revisiting Noise Resilience Strategies in Gesture Recognition: Short-Term Enhancement in Surface Electromyographic Signal Analysis

    Authors: Weiyu Guo, Ziyue Qiao, Ying Sun, Hui Xiong

    Abstract: Gesture recognition based on surface electromyography (sEMG) has been gaining importance in many 3D Interactive Scenes. However, sEMG is easily influenced by various forms of noise in real-world environments, leading to challenges in providing long-term stable interactions through sEMG. Existing methods often struggle to enhance model noise resilience through various predefined data augmentation t… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  29. arXiv:2404.10337  [pdf, other

    cs.AI

    Intriguing Properties of Positional Encoding in Time Series Forecasting

    Authors: Jianqi Zhang, **gyao Wang, Wenwen Qiang, Fanjiang Xu, Changwen Zheng, Fuchun Sun, Hui Xiong

    Abstract: Transformer-based methods have made significant progress in time series forecasting (TSF). They primarily handle two types of tokens, i.e., temporal tokens that contain all variables of the same timestamp, and variable tokens that contain all input time points for a specific variable. Transformer-based methods rely on positional encoding (PE) to mark tokens' positions, facilitating the model to pe… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  30. arXiv:2404.08695  [pdf, other

    cs.CL cs.AI cs.IR

    Enhancing Question Answering for Enterprise Knowledge Bases using Large Language Models

    Authors: Feihu Jiang, Chuan Qin, Kaichun Yao, Chuyu Fang, Fuzhen Zhuang, Hengshu Zhu, Hui Xiong

    Abstract: Efficient knowledge management plays a pivotal role in augmenting both the operational efficiency and the innovative capacity of businesses and organizations. By indexing knowledge through vectorization, a variety of knowledge retrieval methods have emerged, significantly enhancing the efficacy of knowledge management systems. Recently, the rapid advancements in generative natural language process… ▽ More

    Submitted 20 April, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

    Comments: DASFAA 2024 Accepted

  31. arXiv:2404.05331  [pdf, other

    cs.CV

    Mask-ControlNet: Higher-Quality Image Generation with An Additional Mask Prompt

    Authors: Zhiqi Huang, Huixin Xiong, Haoyu Wang, Longguang Wang, Zhiheng Li

    Abstract: Text-to-image generation has witnessed great progress, especially with the recent advancements in diffusion models. Since texts cannot provide detailed conditions like object appearance, reference images are usually leveraged for the control of objects in the generated images. However, existing methods still suffer limited accuracy when the relationship between the foreground and background is com… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  32. arXiv:2404.04313  [pdf, other

    cs.IR cs.AI

    JobFormer: Skill-Aware Job Recommendation with Semantic-Enhanced Transformer

    Authors: Zhihao Guan, Jia-Qi Yang, Yang Yang, Hengshu Zhu, Wenjie Li, Hui Xiong

    Abstract: Job recommendation aims to provide potential talents with suitable job descriptions (JDs) consistent with their career trajectory, which plays an essential role in proactive talent recruitment. In real-world management scenarios, the available JD-user records always consist of JDs, user profiles, and click data, in which the user profiles are typically summarized as the user's skill distribution f… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  33. arXiv:2404.02731  [pdf, other

    eess.IV cs.CV cs.MM

    Event Camera Demosaicing via Swin Transformer and Pixel-focus Loss

    Authors: Yunfan Lu, Yijie Xu, Wenzong Ma, Weiyu Guo, Hui Xiong

    Abstract: Recent research has highlighted improvements in high-quality imaging guided by event cameras, with most of these efforts concentrating on the RGB domain. However, these advancements frequently neglect the unique challenges introduced by the inherent flaws in the sensor design of event cameras in the RAW domain. Specifically, this sensor design results in the partial loss of pixel values, posing ne… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: Accepted for the CVPR 2024 Workshop on Mobile Intelligent Photography & Imaging

  34. arXiv:2404.01897  [pdf, other

    cs.NE cs.AI cs.LG

    Continuous Spiking Graph Neural Networks

    Authors: Nan Yin, Mengzhu Wan, Li Shen, Hitesh Laxmichand Patel, Baopu Li, Bin Gu, Huan Xiong

    Abstract: Continuous graph neural networks (CGNNs) have garnered significant attention due to their ability to generalize existing discrete graph neural networks (GNNs) by introducing continuous dynamics. They typically draw inspiration from diffusion-based methods to introduce a novel propagation scheme, which is analyzed using ordinary differential equations (ODE). However, the implementation of CGNNs req… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  35. arXiv:2403.18864  [pdf, other

    physics.ao-ph cs.AI cs.LG

    Interpretable Machine Learning for Weather and Climate Prediction: A Survey

    Authors: Ruyi Yang, **gyu Hu, Zihao Li, Jianli Mu, Tingzhao Yu, Jiangjiang Xia, Xuhong Li, Aritra Dasgupta, Haoyi Xiong

    Abstract: Advanced machine learning models have recently achieved high predictive accuracy for weather and climate prediction. However, these complex models often lack inherent transparency and interpretability, acting as "black boxes" that impede user trust and hinder further model improvements. As such, interpretable machine learning techniques have become crucial in enhancing the credibility and utility… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: 26 pages, 5 figures

  36. arXiv:2403.17416  [pdf, other

    cs.IR

    AFDGCF: Adaptive Feature De-correlation Graph Collaborative Filtering for Recommendations

    Authors: Wei Wu, Chao Wang, Dazhong Shen, Chuan Qin, Liyi Chen, Hui Xiong

    Abstract: Collaborative filtering methods based on graph neural networks (GNNs) have witnessed significant success in recommender systems (RS), capitalizing on their ability to capture collaborative signals within intricate user-item relationships via message-passing mechanisms. However, these GNN-based RS inadvertently introduce excess linear correlation between user and item embeddings, contradicting the… ▽ More

    Submitted 15 April, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

    Comments: Accepted by SIGIR2024

  37. arXiv:2403.13491  [pdf, other

    cs.DB

    Distance Comparison Operators for Approximate Nearest Neighbor Search: Exploration and Benchmark

    Authors: Zeyu Wang, Haoran Xiong, Zhenying He, Peng Wang, Wei wang

    Abstract: Approximate nearest neighbor search (ANNS) on high-dimensional vectors has become a fundamental and essential component in various machine learning tasks. Prior research has shown that the distance comparison operation is the bottleneck of ANNS, which determines the query and indexing performance. To overcome this challenge, some novel methods have been proposed recently. The basic idea is to esti… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  38. arXiv:2403.13325  [pdf, other

    cs.IR

    Harnessing Large Language Models for Text-Rich Sequential Recommendation

    Authors: Zhi Zheng, Wenshuo Chao, Zhaopeng Qiu, Hengshu Zhu, Hui Xiong

    Abstract: Recent advances in Large Language Models (LLMs) have been changing the paradigm of Recommender Systems (RS). However, when items in the recommendation scenarios contain rich textual information, such as product descriptions in online shop** or news headlines on social media, LLMs require longer texts to comprehensively depict the historical user behavior sequence. This poses significant challeng… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  39. arXiv:2403.06488  [pdf, other

    cs.CV

    Query-guided Prototype Evolution Network for Few-Shot Segmentation

    Authors: Runmin Cong, Hang Xiong, **peng Chen, Wei Zhang, Qingming Huang, Yao Zhao

    Abstract: Previous Few-Shot Segmentation (FSS) approaches exclusively utilize support features for prototype generation, neglecting the specific requirements of the query. To address this, we present the Query-guided Prototype Evolution Network (QPENet), a new method that integrates query features into the generation process of foreground and background prototypes, thereby yielding customized prototypes att… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: Accepted by IEEE TMM 2024

  40. arXiv:2403.04306  [pdf, other

    cs.CV cs.AI cs.LG

    Effectiveness Assessment of Recent Large Vision-Language Models

    Authors: Yao Jiang, Xinyu Yan, Ge-Peng Ji, Keren Fu, Meijun Sun, Huan Xiong, Deng-** Fan, Fahad Shahbaz Khan

    Abstract: The advent of large vision-language models (LVLMs) represents a remarkable advance in the quest for artificial general intelligence. However, the model's effectiveness in both specialized and general tasks warrants further investigation. This paper endeavors to evaluate the competency of popular LVLMs in specialized and general tasks, respectively, aiming to offer a comprehensive understanding of… ▽ More

    Submitted 11 June, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

    Comments: Accepted by Visual Intelligence

  41. arXiv:2403.02626  [pdf, other

    cs.CV cs.LG

    Modeling Collaborator: Enabling Subjective Vision Classification With Minimal Human Effort via LLM Tool-Use

    Authors: Imad Eddine Toubal, Aditya Avinash, Neil Gordon Alldrin, Jan Dlabal, Wenlei Zhou, Enming Luo, Otilia Stretcu, Hao Xiong, Chun-Ta Lu, Howard Zhou, Ranjay Krishna, Ariel Fuxman, Tom Duerig

    Abstract: From content moderation to wildlife conservation, the number of applications that require models to recognize nuanced or subjective visual concepts is growing. Traditionally, develo** classifiers for such concepts requires substantial manual effort measured in hours, days, or even months to identify and annotate data needed for training. Even with recently proposed Agile Modeling techniques, whi… ▽ More

    Submitted 19 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

  42. arXiv:2403.01748  [pdf, other

    cs.CL cs.AI

    NeuSpeech: Decode Neural signal as Speech

    Authors: Yiqian Yang, Yiqun Duan, Qiang Zhang, Hyejeong Jo, **ni Zhou, Won Hee Lee, Ren**g Xu, Hui Xiong

    Abstract: Decoding language from brain dynamics is an important open direction in the realm of brain-computer interface (BCI), especially considering the rapid growth of large language models. Compared to invasive-based signals which require electrode implantation surgery, non-invasive neural signals (e.g. EEG, MEG) have attracted increasing attention considering their safety and generality. However, the ex… ▽ More

    Submitted 3 June, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

  43. arXiv:2402.18278  [pdf, other

    cs.CV

    EAN-MapNet: Efficient Vectorized HD Map Construction with Anchor Neighborhoods

    Authors: Huiyuan Xiong, Jun Shen, Taohong Zhu, Yuelong Pan

    Abstract: High-definition (HD) map is crucial for autonomous driving systems. Most existing works design map elements detection heads based on the DETR decoder. However, the initial queries lack explicit incorporation of physical positional information, and vanilla self-attention entails high computational complexity. Therefore, we propose EAN-MapNet for Efficiently constructing HD map using Anchor Neighbor… ▽ More

    Submitted 7 March, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

  44. arXiv:2402.17680  [pdf, other

    cs.CV

    MCF-VC: Mitigate Catastrophic Forgetting in Class-Incremental Learning for Multimodal Video Captioning

    Authors: Huiyu Xiong, Lanxiao Wang, Heqian Qiu, Tai** Zhao, Benliu Qiu, Hongliang Li

    Abstract: To address the problem of catastrophic forgetting due to the invisibility of old categories in sequential input, existing work based on relatively simple categorization tasks has made some progress. In contrast, video captioning is a more complex task in multimodal scenario, which has not been explored in the field of incremental learning. After identifying this stability-plasticity problem when a… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: 13 pages

  45. arXiv:2402.13548  [pdf, other

    cs.LG eess.SP

    DiffPLF: A Conditional Diffusion Model for Probabilistic Forecasting of EV Charging Load

    Authors: Siyang Li, Hui Xiong, Yize Chen

    Abstract: Due to the vast electric vehicle (EV) penetration to distribution grid, charging load forecasting is essential to promote charging station operation and demand-side management.However, the stochastic charging behaviors and associated exogenous factors render future charging load patterns quite volatile and hard to predict. Accordingly, we devise a novel Diffusion model termed DiffPLF for Probabili… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    Comments: Accepted to the 23rd Power Systems Computation Conference (PSCC). Code is released at https://github.com/LSY-Cython/DiffPLF

  46. arXiv:2402.07158  [pdf, other

    cs.SE cs.LG

    Effort and Size Estimation in Software Projects with Large Language Model-based Intelligent Interfaces

    Authors: Claudionor N. Coelho Jr, Hanchen Xiong, Tushar Karayil, Sree Koratala, Rex Shang, Jacob Bollinger, Mohamed Shabar, Syam Nair

    Abstract: The advancement of Large Language Models (LLM) has also resulted in an equivalent proliferation in its applications. Software design, being one, has gained tremendous benefits in using LLMs as an interface component that extends fixed user stories. However, inclusion of LLM-based AI agents in software design often poses unexpected challenges, especially in the estimation of development efforts. Th… ▽ More

    Submitted 28 June, 2024; v1 submitted 11 February, 2024; originally announced February 2024.

  47. Dynamic Sparse Learning: A Novel Paradigm for Efficient Recommendation

    Authors: Shuyao Wang, Yongduo Sui, Jiancan Wu, Zhi Zheng, Hui Xiong

    Abstract: In the realm of deep learning-based recommendation systems, the increasing computational demands, driven by the growing number of users and items, pose a significant challenge to practical deployment. This challenge is primarily twofold: reducing the model size while effectively learning user and item representations for efficient recommendations. Despite considerable advancements in model compres… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: 10 pages, 5 figures, 4 tables. Accecpted by WSDM 2024

  48. arXiv:2402.02149  [pdf, other

    cs.CV cs.LG

    Improving Diffusion Models for Inverse Problems Using Optimal Posterior Covariance

    Authors: Xinyu Peng, Ziyang Zheng, Wenrui Dai, Nuoqian Xiao, Chenglin Li, Junni Zou, Hongkai Xiong

    Abstract: Recent diffusion models provide a promising zero-shot solution to noisy linear inverse problems without retraining for specific inverse problems. In this paper, we reveal that recent methods can be uniformly interpreted as employing a Gaussian approximation with hand-crafted isotropic covariance for the intractable denoising posterior to approximate the conditional posterior mean. Inspired by this… ▽ More

    Submitted 2 June, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

  49. arXiv:2402.01749  [pdf, other

    cs.CY cs.AI cs.LG

    Towards Urban General Intelligence: A Review and Outlook of Urban Foundation Models

    Authors: Weijia Zhang, **dong Han, Zhao Xu, Hang Ni, Hao Liu, Hui Xiong

    Abstract: Machine learning techniques are now integral to the advancement of intelligent urban services, playing a crucial role in elevating the efficiency, sustainability, and livability of urban environments. The recent emergence of foundation models such as ChatGPT marks a revolutionary shift in the fields of machine learning and artificial intelligence. Their unparalleled capabilities in contextual unde… ▽ More

    Submitted 29 January, 2024; originally announced February 2024.

  50. arXiv:2401.16011  [pdf, other

    cs.LG cs.AI cs.SI

    GPS: Graph Contrastive Learning via Multi-scale Augmented Views from Adversarial Pooling

    Authors: Wei Ju, Yiyang Gu, Zhengyang Mao, Ziyue Qiao, Yifang Qin, Xiao Luo, Hui Xiong, Ming Zhang

    Abstract: Self-supervised graph representation learning has recently shown considerable promise in a range of fields, including bioinformatics and social networks. A large number of graph contrastive learning approaches have shown promising performance for representation learning on graphs, which train models by maximizing agreement between original graphs and their augmented views (i.e., positive views). U… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: Accepted by SCIENCE CHINA Information Sciences (SCIS 2024)