Skip to main content

Showing 1–50 of 1,436 results for author: Wang, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01527  [pdf, other

    cs.CL

    KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches

    Authors: Jiayi Yuan, Hongyi Liu, Shaochen, Zhong, Yu-Neng Chuang, Songchen Li, Guanchu Wang, Duy Le, Hongye **, Vipin Chaudhary, Zhaozhuo Xu, Zirui Liu, Xia Hu

    Abstract: Long context capability is a crucial competency for large language models (LLMs) as it mitigates the human struggle to digest long-form texts. This capability enables complex task-solving scenarios such as book summarization, code assistance, and many more tasks that are traditionally manpower-intensive. However, transformer-based LLMs face significant challenges with long context input due to the… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2407.00296  [pdf

    cs.CV

    SolarSAM: Building-scale Photovoltaic Potential Assessment Based on Segment Anything Model (SAM) and Remote Sensing for Emerging City

    Authors: Guohao Wang

    Abstract: Driven by advancements in photovoltaic (PV) technology, solar energy has emerged as a promising renewable energy source, due to its ease of integration onto building rooftops, facades, and windows. For the emerging cities, the lack of detailed street-level data presents a challenge for effectively assessing the potential of building-integrated photovoltaic (BIPV). To address this, this study intro… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

  3. arXiv:2407.00046  [pdf, other

    cs.DC cs.GR

    Barrier-Augmented Lagrangian for GPU-based Elastodynamic Contact

    Authors: Dewen Guo, Minchen Li, Yin Yang, Guo** Wang, Sheng Li

    Abstract: We propose a GPU-based iterative method for accelerated elastodynamic simulation with the log-barrier-based contact model. While Newton's method is a conventional choice for solving the interior-point system, the presence of ill-conditioned log barriers often necessitates a direct solution at each linearized substep and costs substantial storage and computational overhead. Moreover, constraint set… ▽ More

    Submitted 4 June, 2024; originally announced July 2024.

    Comments: 17 pages, 30 figures

  4. arXiv:2406.19844  [pdf, other

    cs.CV cs.RO

    StreamMOTP: Streaming and Unified Framework for Joint 3D Multi-Object Tracking and Trajectory Prediction

    Authors: Jiaheng Zhuang, Guoan Wang, Siyu Zhang, Xiyang Wang, Hangning Zhou, Ziyao Xu, Chi Zhang, Zhiheng Li

    Abstract: 3D multi-object tracking and trajectory prediction are two crucial modules in autonomous driving systems. Generally, the two tasks are handled separately in traditional paradigms and a few methods have started to explore modeling these two tasks in a joint manner recently. However, these approaches suffer from the limitations of single-frame training and inconsistent coordinate representations bet… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  5. arXiv:2406.19646  [pdf, other

    cs.RO

    Time-optimal Flight in Cluttered Environments via Safe Reinforcement Learning

    Authors: Wei Xiao, Zhaohan Feng, Ziyu Zhou, Jian Sun, Gang Wang, Jie Chen

    Abstract: This paper addresses the problem of guiding a quadrotor through a predefined sequence of waypoints in cluttered environments, aiming to minimize the flight time while avoiding collisions. Previous approaches either suffer from prolonged computational time caused by solving complex non-convex optimization problems or are limited by the inherent smoothness of polynomial trajectory representations, t… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: 7 pages, 3 figures,

  6. arXiv:2406.19623  [pdf

    cs.CE

    FRA-DiagSys: A Transformer Winding Fault Diagnosis System for Identifying Fault Types and degrees Using Frequency Response Analysis

    Authors: Guohao Wang

    Abstract: The electric power transformer is a critical component in electrical distribution networks, and the diagnosis of faults in transformers is an important research area. Frequency Response Analysis (FRA) methods are widely used for analyzing winding faults in transformers, particularly in Chinese power stations. However, the current approach relies on manual expertise to interpret FRA curves, which c… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  7. arXiv:2406.18884  [pdf, other

    cs.AI

    Sequential three-way group decision-making for double hierarchy hesitant fuzzy linguistic term set

    Authors: Nanfang Luo, Qinghua Zhang, Qin Xie, Yutai Wang, Longjun Yin, Guoyin Wang

    Abstract: Group decision-making (GDM) characterized by complexity and uncertainty is an essential part of various life scenarios. Most existing researches lack tools to fuse information quickly and interpret decision results for partially formed decisions. This limitation is particularly noticeable when there is a need to improve the efficiency of GDM. To address this issue, a novel multi-level sequential t… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  8. arXiv:2406.18485  [pdf, other

    cs.DC

    LoongTrain: Efficient Training of Long-Sequence LLMs with Head-Context Parallelism

    Authors: Diandian Gu, Peng Sun, Qinghao Hu, Ting Huang, Xun Chen, Yingtong Xiong, Guoteng Wang, Qiaoling Chen, Shangchun Zhao, Jiarui Fang, Yonggang Wen, Tianwei Zhang, Xin **, Xuanzhe Liu

    Abstract: Efficiently training LLMs with long sequences is important yet challenged by the massive computation and memory requirements. Sequence parallelism has been proposed to tackle these problems, but existing methods suffer from scalability or efficiency issues. We propose LoongTrain, a novel system to efficiently train LLMs with long sequences at scale. The core of LoongTrain is the 2D-Attention mecha… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  9. arXiv:2406.17507  [pdf, other

    cs.IR

    ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling

    Authors: Minghui Fang, Shengpeng Ji, Jialong Zuo, Hai Huang, Yan Xia, Jieming Zhu, Xize Cheng, Xiaoda Yang, Wenrui Liu, Gang Wang, Zhenhua Dong, Zhou Zhao

    Abstract: Generative retrieval, which has demonstrated effectiveness in text-to-text retrieval, utilizes a sequence-to-sequence model to directly generate candidate identifiers based on natural language queries. Without explicitly computing the similarity between queries and candidates, generative retrieval surpasses dual-tower models in both speed and accuracy on large-scale corpora, providing new insights… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  10. arXiv:2406.16989  [pdf, other

    cs.LG cs.AI

    Retrieval-Augmented Mixture of LoRA Experts for Uploadable Machine Learning

    Authors: Ziyu Zhao, Leilei Gan, Guoyin Wang, Yuwei Hu, Tao Shen, Hongxia Yang, Kun Kuang, Fei Wu

    Abstract: Low-Rank Adaptation (LoRA) offers an efficient way to fine-tune large language models (LLMs). Its modular and plug-and-play nature allows the integration of various domain-specific LoRAs, enhancing LLM capabilities. Open-source platforms like Huggingface and Modelscope have introduced a new computational paradigm, Uploadable Machine Learning (UML). In UML, contributors use decentralized data to tr… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2402.09997

  11. arXiv:2406.16268  [pdf, other

    cs.DB

    Efficient Antagonistic k-plex Enumeration in Signed Graphs

    Authors: Lantian Xu, Rong-Hua Li, Dong Wen, Qiangqiang Dai, Guoren Wang, Lu Qin

    Abstract: A signed graph is a graph where each edge receives a sign, positive or negative. The signed graph model has been used in many real applications, such as protein complex discovery and social network analysis. Finding cohesive subgraphs in signed graphs is a fundamental problem. A k-plex is a common model for cohesive subgraphs in which every vertex is adjacent to all but at most k vertices within t… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  12. arXiv:2406.15768  [pdf, other

    cs.CV

    MR-MLLM: Mutual Reinforcement of Multimodal Comprehension and Vision Perception

    Authors: Guanqun Wang, Xinyu Wei, Jiaming Liu, Ray Zhang, Yichi Zhang, Kevin Zhang, Maurice Chong, Shanghang Zhang

    Abstract: In recent years, multimodal large language models (MLLMs) have shown remarkable capabilities in tasks like visual question answering and common sense reasoning, while visual perception models have made significant strides in perception tasks, such as detection and segmentation. However, MLLMs mainly focus on high-level image-text interpretations and struggle with fine-grained visual understanding,… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: 14 pages, 8 figures

  13. arXiv:2406.15222  [pdf

    eess.IV cs.AI cs.CV

    Rapid and Accurate Diagnosis of Acute Aortic Syndrome using Non-contrast CT: A Large-scale, Retrospective, Multi-center and AI-based Study

    Authors: Yujian Hu, Yilang Xiang, Yan-Jie Zhou, Yangyan He, Shifeng Yang, Xiaolong Du, Chunlan Den, Youyao Xu, Gaofeng Wang, Zhengyao Ding, **gyong Huang, Wenjun Zhao, Xuejun Wu, Donglin Li, Qianqian Zhu, Zhenjiang Li, Chenyang Qiu, Ziheng Wu, Yunjun He, Chen Tian, Yihui Qiu, Zuodong Lin, Xiaolong Zhang, Yuan He, Zhenpeng Yuan , et al. (15 additional authors not shown)

    Abstract: Chest pain symptoms are highly prevalent in emergency departments (EDs), where acute aortic syndrome (AAS) is a catastrophic cardiovascular emergency with a high fatality rate, especially when timely and accurate treatment is not administered. However, current triage practices in the ED can cause up to approximately half of patients with AAS to have an initially missed diagnosis or be misdiagnosed… ▽ More

    Submitted 24 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: under peer review

  14. arXiv:2406.14088  [pdf, other

    cs.DC cs.AI cs.CL cs.LG

    ReaLHF: Optimized RLHF Training for Large Language Models through Parameter Reallocation

    Authors: Zhiyu Mei, Wei Fu, Kaiwei Li, Guangju Wang, Huanchen Zhang, Yi Wu

    Abstract: Reinforcement Learning from Human Feedback (RLHF) stands as a pivotal technique in empowering large language model (LLM) applications. Since RLHF involves diverse computational workloads and intricate dependencies among multiple LLMs, directly adopting parallelization techniques from supervised training can result in sub-optimal performance. To overcome this limitation, we propose a novel approach… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 13 pages (15 pages with references), 13 figures

  15. arXiv:2406.14045  [pdf, other

    cs.LG cs.AI

    Understanding Different Design Choices in Training Large Time Series Models

    Authors: Yu-Neng Chuang, Songchen Li, Jiayi Yuan, Guanchu Wang, Kwei-Herng Lai, Leisheng Yu, Sirui Ding, Chia-Yuan Chang, Qiaoyu Tan, Daochen Zha, Xia Hu

    Abstract: Inspired by Large Language Models (LLMs), Time Series Forecasting (TSF), a long-standing task in time series analysis, is undergoing a transition towards Large Time Series Models (LTSMs), aiming to train universal transformer-based models for TSF. However, training LTSMs on heterogeneous time series data poses unique challenges, including diverse frequencies, dimensions, and patterns across datase… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  16. arXiv:2406.13768  [pdf, other

    cs.DC cs.AI cs.LG cs.PF

    FastPersist: Accelerating Model Checkpointing in Deep Learning

    Authors: Guanhua Wang, Olatunji Ruwase, Bing Xie, Yuxiong He

    Abstract: Model checkpoints are critical Deep Learning (DL) artifacts that enable fault tolerance for training and downstream applications, such as inference. However, writing checkpoints to persistent storage, and other I/O aspects of DL training, are mostly ignored by compute-focused optimization efforts for faster training of rapidly growing models and datasets. Towards addressing this imbalance, we prop… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 11 pages

  17. arXiv:2406.13674  [pdf, other

    eess.IV cs.CV

    Rethinking Abdominal Organ Segmentation (RAOS) in the clinical scenario: A robustness evaluation benchmark with challenging cases

    Authors: Xiangde Luo, Zihan Li, Shaoting Zhang, Wenjun Liao, Guotai Wang

    Abstract: Deep learning has enabled great strides in abdominal multi-organ segmentation, even surpassing junior oncologists on common cases or organs. However, robustness on corner cases and complex organs remains a challenging open problem for clinical adoption. To investigate model robustness, we collected and annotated the RAOS dataset comprising 413 CT scans ($\sim$80k 2D images, $\sim$8k 3D organ annot… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 10 pages, 1 figure, 6 tables, Early Accept to MICCAI 2024

  18. arXiv:2406.13672  [pdf, other

    cs.CV

    Q-SNNs: Quantized Spiking Neural Networks

    Authors: Wenjie Wei, Yu Liang, Ammar Belatreche, Yichen Xiao, Honglin Cao, Zhenbang Ren, Guoqing Wang, Malu Zhang, Yang Yang

    Abstract: Brain-inspired Spiking Neural Networks (SNNs) leverage sparse spikes to represent information and process them in an asynchronous event-driven manner, offering an energy-efficient paradigm for the next generation of machine intelligence. However, the current focus within the SNN community prioritizes accuracy optimization through the development of large-scale models, limiting their viability in r… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 8 pages, 5 figures

  19. arXiv:2406.11799  [pdf, other

    eess.IV cs.CV cs.LG

    Mix-Domain Contrastive Learning for Unpaired H&E-to-IHC Stain Translation

    Authors: Song Wang, Zhong Zhang, Huan Yan, Ming Xu, Guanghui Wang

    Abstract: H&E-to-IHC stain translation techniques offer a promising solution for precise cancer diagnosis, especially in low-resource regions where there is a shortage of health professionals and limited access to expensive equipment. Considering the pixel-level misalignment of H&E-IHC image pairs, current research explores the pathological consistency between patches from the same positions of the image pa… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  20. arXiv:2406.11247  [pdf, other

    cs.CV

    STEVE Series: Step-by-Step Construction of Agent Systems in Minecraft

    Authors: Zhonghan Zhao, Wenhao Chai, Xuan Wang, Ke Ma, Kewei Chen, Dongxu Guo, Tian Ye, Yanting Zhang, Hongwei Wang, Gaoang Wang

    Abstract: Building an embodied agent system with a large language model (LLM) as its core is a promising direction. Due to the significant costs and uncontrollable factors associated with deploying and training such agents in the real world, we have decided to begin our exploration within the Minecraft environment. Our STEVE Series agents can complete basic tasks in a virtual environment and more challengin… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: CVPR 2024 Embodied AI Workshop

  21. arXiv:2406.10959  [pdf, ps, other

    math.OC cs.LG

    On Convergence and Rate of Convergence of Policy Improvement Algorithms

    Authors: ** Ma, Gaozhan Wang, Jianfeng Zhang

    Abstract: In this paper we provide a simple proof from scratch for the convergence of Policy Improvement Algorithm (PIA) for a continuous time entropy-regularized stochastic control problem. Such convergence has been established by Huang-Wang-Zhou(2023) by using sophisticated PDE estimates for the iterative PDEs involved in the PIA. Our approach builds on some Feynman-Kac type probabilistic representation f… ▽ More

    Submitted 20 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

    Comments: In this version we added a convergence result for the 1-dimensional model with diffusion controls

  22. arXiv:2406.08785  [pdf, other

    cs.CV

    BEVSpread: Spread Voxel Pooling for Bird's-Eye-View Representation in Vision-based Roadside 3D Object Detection

    Authors: Wenjie Wang, Yehao Lu, Guangcong Zheng, Shuigen Zhan, Xiaoqing Ye, Zichang Tan, **gdong Wang, Gaoang Wang, Xi Li

    Abstract: Vision-based roadside 3D object detection has attracted rising attention in autonomous driving domain, since it encompasses inherent advantages in reducing blind spots and expanding perception range. While previous work mainly focuses on accurately estimating depth or height for 2D-to-3D map**, ignoring the position approximation error in the voxel pooling process. Inspired by this insight, we p… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  23. arXiv:2406.08773  [pdf, other

    cs.CV

    DenoiseReID: Denoising Model for Representation Learning of Person Re-Identification

    Authors: Zhengrui Xu, Guan'an Wang, Xiaowen Huang, Jitao Sang

    Abstract: In this paper, we propose a novel Denoising Model for Representation Learning and take Person Re-Identification (ReID) as a benchmark task, named DenoiseReID, to improve feature discriminative with joint feature extraction and denoising. In the deep learning epoch, backbones which consists of cascaded embedding layers (e.g. convolutions or transformers) to progressively extract useful features, be… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  24. arXiv:2406.06664  [pdf, other

    eess.AS cs.LG cs.SD

    ASTRA: Aligning Speech and Text Representations for Asr without Sampling

    Authors: Neeraj Gaur, Rohan Agrawal, Gary Wang, Parisa Haghani, Andrew Rosenberg, Bhuvana Ramabhadran

    Abstract: This paper introduces ASTRA, a novel method for improving Automatic Speech Recognition (ASR) through text injection.Unlike prevailing techniques, ASTRA eliminates the need for sampling to match sequence lengths between speech and text modalities. Instead, it leverages the inherent alignments learned within CTC/RNNT models. This approach offers the following two advantages, namely, avoiding potenti… ▽ More

    Submitted 13 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: To be published in Interspeech 2024

  25. arXiv:2406.05982  [pdf

    eess.IV cs.LG physics.med-ph

    Artificial Intelligence for Neuro MRI Acquisition: A Review

    Authors: Hongjia Yang, Guanhua Wang, Ziyu Li, Haoxiang Li, Jialan Zheng, Yuxin Hu, Xiaozhi Cao, Congyu Liao, Huihui Ye, Qiyuan Tian

    Abstract: Magnetic resonance imaging (MRI) has significantly benefited from the resurgence of artificial intelligence (AI). By leveraging AI's capabilities in large-scale optimization and pattern recognition, innovative methods are transforming the MRI acquisition workflow, including planning, sequence design, and correction of acquisition artifacts. These emerging algorithms demonstrate substantial potenti… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Submitted to MAGMA for review

  26. arXiv:2406.05832  [pdf, other

    q-bio.QM cs.LG q-bio.BM

    Improving Antibody Design with Force-Guided Sampling in Diffusion Models

    Authors: Paulina Kulytė, Francisco Vargas, Simon Valentin Mathis, Yu Guang Wang, JosĆ© Miguel HernĆ”ndez-Lobato, Pietro LiĆ²

    Abstract: Antibodies, crucial for immune defense, primarily rely on complementarity-determining regions (CDRs) to bind and neutralize antigens, such as viruses. The design of these CDRs determines the antibody's affinity and specificity towards its target. Generative models, particularly denoising diffusion probabilistic models (DDPMs), have shown potential to advance the structure-based design of CDR regio… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  27. arXiv:2406.05540  [pdf, other

    q-bio.QM cs.AI cs.CL cs.LG

    A Fine-tuning Dataset and Benchmark for Large Language Models for Protein Understanding

    Authors: Yiqing Shen, Zan Chen, Michail Mamalakis, Luhan He, Haiyang Xia, Tianbin Li, Yanzhou Su, Junjun He, Yu Guang Wang

    Abstract: The parallels between protein sequences and natural language in their sequential structures have inspired the application of large language models (LLMs) to protein understanding. Despite the success of LLMs in NLP, their effectiveness in comprehending protein sequences remains an open question, largely due to the absence of datasets linking protein sequences to descriptive text. Researchers have… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  28. arXiv:2406.04983  [pdf, other

    cs.CV

    CityCraft: A Real Crafter for 3D City Generation

    Authors: Jie Deng, Wenhao Chai, Junsheng Huang, Zhonghan Zhao, Qixuan Huang, Mingyan Gao, Jianshu Guo, Shengyu Hao, Wenhao Hu, Jenq-Neng Hwang, Xi Li, Gaoang Wang

    Abstract: City scene generation has gained significant attention in autonomous driving, smart city development, and traffic simulation. It helps enhance infrastructure planning and monitoring solutions. Existing methods have employed a two-stage process involving city layout generation, typically using Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), or Transformers, followed by neur… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: 20 pages, 9 figures

  29. arXiv:2406.03001  [pdf, other

    cs.CV cs.AI

    EdgeSync: Faster Edge-model Updating via Adaptive Continuous Learning for Video Data Drift

    Authors: Peng Zhao, Runchu Dong, Guiqin Wang, Cong Zhao

    Abstract: Real-time video analytics systems typically place models with fewer weights on edge devices to reduce latency. The distribution of video content features may change over time for various reasons (i.e. light and weather change) , leading to accuracy degradation of existing models, to solve this problem, recent work proposes a framework that uses a remote server to continually train and adapt the li… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  30. arXiv:2406.01911  [pdf, ps, other

    cs.SI cs.DS

    Influence Maximization in Hypergraphs by Stratified Sampling for Efficient Generation of Reverse Reachable Sets

    Authors: Lingling Zhang, Hong Jiang, Ye Yuan, Guoren Wang

    Abstract: Given a hypergraph, influence maximization (IM) is to discover a seed set containing $k$ vertices that have the maximal influence. Although the existing vertex-based IM algorithms perform better than the hyperedge-based algorithms by generating random reverse researchable (RR) sets, they are inefficient because (i) they ignore important structural information associated with hyperedges and thus ob… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 15 pages,10figures

  31. arXiv:2406.01767  [pdf, other

    cs.RO

    Region-aware Grasp Framework with Normalized Grasp Space for 6-DoF Gras** in Cluttered Scene

    Authors: Siang Chen, Pengwei Xie, Wei Tang, Dingchang Hu, Gui** Wang

    Abstract: Regional geometric information is crucial for determining grasp poses. A series of region-based methods succeed in extracting regional features and enhancing grasp detection quality. However, faced with a cluttered scene with multiple objects and potential collision, the definition of the grasp-relevant region remains inconsistent among methods, and the relationship between grasps and regional spa… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  32. arXiv:2406.00488  [pdf, other

    cs.LG cs.DC

    Federated Model Heterogeneous Matryoshka Representation Learning

    Authors: Li** Yi, Han Yu, Chao Ren, Gang Wang, Xiaoguang Liu, Xiaoxiao Li

    Abstract: Model heterogeneous federated learning (MHeteroFL) enables FL clients to collaboratively train models with heterogeneous structures in a distributed fashion. However, existing MHeteroFL methods rely on training loss to transfer knowledge between the client model and the server model, resulting in limited knowledge exchange. To address this limitation, we propose the Federated model heterogeneous M… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  33. arXiv:2405.20881  [pdf, other

    cs.CV

    S4Fusion: Saliency-aware Selective State Space Model for Infrared Visible Image Fusion

    Authors: Haolong Ma, Hui Li, Chunyang Cheng, Gaoang Wang, Xiaoning Song, Xiaojun Wu

    Abstract: As one of the tasks in Image Fusion, Infrared and Visible Image Fusion aims to integrate complementary information captured by sensors of different modalities into a single image. The Selective State Space Model (SSSM), known for its ability to capture long-range dependencies, has demonstrated its potential in the field of computer vision. However, in image fusion, current methods underestimate th… ▽ More

    Submitted 3 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

  34. arXiv:2405.20339  [pdf, other

    cs.CV

    Visual Perception by Large Language Model's Weights

    Authors: Feipeng Ma, Hongwei Xue, Guangting Wang, Yizhou Zhou, Fengyun Rao, Shilin Yan, Yueyi Zhang, Siying Wu, Mike Zheng Shou, Xiaoyan Sun

    Abstract: Existing Multimodal Large Language Models (MLLMs) follow the paradigm that perceives visual information by aligning visual features with the input space of Large Language Models (LLMs), and concatenating visual tokens with text tokens to form a unified sequence input for LLMs. These methods demonstrate promising results on various vision-language tasks but are limited by the high computational eff… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  35. Image-to-Joint Inverse Kinematic of a Supportive Continuum Arm Using Deep Learning

    Authors: Shayan Sepahvand, Guanghui Wang, Farrokh Janabi-Sharifi

    Abstract: In this work, a deep learning-based technique is used to study the image-to-joint inverse kinematics of a tendon-driven supportive continuum arm. An eye-off-hand configuration is considered by mounting a camera at a fixed pose with respect to the inertial frame attached at the arm base. This camera captures an image for each distinct joint variable at each sampling time to construct the training d… ▽ More

    Submitted 26 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: Presented at the Candian Conference on Robots and Vision (CRV)

    Journal ref: Proceedings of the Conference on Robots and Vision (2024)

  36. arXiv:2405.20136  [pdf, other

    cs.CV

    A Multimodal Dangerous State Recognition and Early Warning System for Elderly with Intermittent Dementia

    Authors: Liyun Deng, Lei **, Guangcheng Wang, Quan Shi, Han Wang

    Abstract: In response to the social issue of the increasing number of elderly vulnerable groups going missing due to the aggravating aging population in China, our team has developed a wearable anti-loss device and intelligent early warning system for elderly individuals with intermittent dementia using artificial intelligence and IoT technology. This system comprises an anti-loss smart helmet, a cloud comp… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 13 pages,9 figures

  37. arXiv:2405.19333  [pdf, other

    cs.CV

    Multi-Modal Generative Embedding Model

    Authors: Feipeng Ma, Hongwei Xue, Guangting Wang, Yizhou Zhou, Fengyun Rao, Shilin Yan, Yueyi Zhang, Siying Wu, Mike Zheng Shou, Xiaoyan Sun

    Abstract: Most multi-modal tasks can be formulated into problems of either generation or embedding. Existing models usually tackle these two types of problems by decoupling language modules into a text decoder for generation, and a text encoder for embedding. To explore the minimalism of multi-modal paradigms, we attempt to achieve only one model per modality in this work. We propose a Multi-Modal Generativ… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  38. arXiv:2405.18533  [pdf, other

    eess.IV cs.CV

    Cardiovascular Disease Detection from Multi-View Chest X-rays with BI-Mamba

    Authors: Zefan Yang, Jia** Zhang, Ge Wang, Mannudeep K. Kalra, **kun Yan

    Abstract: Accurate prediction of Cardiovascular disease (CVD) risk in medical imaging is central to effective patient health management. Previous studies have demonstrated that imaging features in computed tomography (CT) can help predict CVD risk. However, CT entails notable radiation exposure, which may result in adverse health effects for patients. In contrast, chest X-ray emits significantly lower level… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Early accepted paper for MICCAI 2024

  39. arXiv:2405.17998  [pdf, other

    cs.IR cs.AI cs.CL

    Source Echo Chamber: Exploring the Escalation of Source Bias in User, Data, and Recommender System Feedback Loop

    Authors: Yuqi Zhou, Sunhao Dai, Liang Pang, Gang Wang, Zhenhua Dong, Jun Xu, Ji-Rong Wen

    Abstract: Recently, researchers have uncovered that neural retrieval models prefer AI-generated content (AIGC), called source bias. Compared to active search behavior, recommendation represents another important means of information acquisition, where users are more prone to source bias. Furthermore, delving into the recommendation scenario, as AIGC becomes integrated within the feedback loop involving user… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  40. arXiv:2405.17418  [pdf, other

    cs.CV

    Self-Corrected Multimodal Large Language Model for End-to-End Robot Manipulation

    Authors: Jiaming Liu, Chenxuan Li, Guanqun Wang, Lily Lee, Kaichen Zhou, Sixiang Chen, Chuyan Xiong, Jiaxin Ge, Renrui Zhang, Shanghang Zhang

    Abstract: Robot manipulation policies have shown unsatisfactory action performance when confronted with novel task or object instances. Hence, the capability to automatically detect and self-correct failure action is essential for a practical robotic system. Recently, Multimodal Large Language Models (MLLMs) have shown promise in visual instruction following and demonstrated strong reasoning abilities in va… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  41. arXiv:2405.17229  [pdf, other

    cs.HC

    InsigHTable: Insight-driven Hierarchical Table Visualization with Reinforcement Learning

    Authors: Guozheng Li, Peng He, Xinyu Wang, Runfei Li, Chi Harold Liu, Chuangxin Ou, Dong He, Guoren Wang

    Abstract: Embedding visual representations within original hierarchical tables can mitigate additional cognitive load stemming from the division of users' attention. The created hierarchical table visualizations can help users understand and explore complex data with multi-level attributes. However, because of many options available for transforming hierarchical tables and selecting subsets for embedding, t… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  42. arXiv:2405.16925  [pdf, other

    cs.CV

    OED: Towards One-stage End-to-End Dynamic Scene Graph Generation

    Authors: Guan Wang, Zhimin Li, Qingchao Chen, Yang Liu

    Abstract: Dynamic Scene Graph Generation (DSGG) focuses on identifying visual relationships within the spatial-temporal domain of videos. Conventional approaches often employ multi-stage pipelines, which typically consist of object detection, temporal association, and multi-relation classification. However, these methods exhibit inherent limitations due to the separation of multiple stages, and independent… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Accepted by CVPR'24

  43. arXiv:2405.16546  [pdf, other

    cs.IR cs.CL

    Cocktail: A Comprehensive Information Retrieval Benchmark with LLM-Generated Documents Integration

    Authors: Sunhao Dai, Weihao Liu, Yuqi Zhou, Liang Pang, Rongju Ruan, Gang Wang, Zhenhua Dong, Jun Xu, Ji-Rong Wen

    Abstract: The proliferation of Large Language Models (LLMs) has led to an influx of AI-generated content (AIGC) on the internet, transforming the corpus of Information Retrieval (IR) systems from solely human-written to a coexistence with LLM-generated content. The impact of this surge in AIGC on IR systems remains an open question, with the primary challenge being the lack of a dedicated benchmark for rese… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: Accepted by Findings of ACL 2024; Datasets Link: https://huggingface.co/IR-Cocktail

  44. arXiv:2405.16120  [pdf, other

    cs.IR

    BankFair: Balancing Accuracy and Fairness under Varying User Traffic in Recommender System

    Authors: Xiaopeng Ye, Chen Xu, Jun Xu, Xuyang Xie, Gang Wang, Zhenhua Dong

    Abstract: Driven by sustainability and economic considerations, two-sided recommendation platforms are required to satisfy the needs of both users and providers. Previous studies often indicate that the two sides' needs differ in urgency: providers have relatively long-term exposure requirements, while users desire short-term, accurate services. However, our empirical study reveals that existing methods for… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  45. arXiv:2405.16064  [pdf, other

    cs.CL

    Keypoint-based Progressive Chain-of-Thought Distillation for LLMs

    Authors: Kaituo Feng, Changsheng Li, Xiaolu Zhang, Jun Zhou, Ye Yuan, Guoren Wang

    Abstract: Chain-of-thought distillation is a powerful technique for transferring reasoning abilities from large language models (LLMs) to smaller student models. Previous methods typically require the student to mimic the step-by-step rationale produced by LLMs, often facing the following challenges: (i) Tokens within a rationale vary in significance, and treating them equally may fail to accurately mimic k… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: Accepted by ICML 2024

  46. arXiv:2405.15728  [pdf, other

    cs.CV

    Disease-informed Adaptation of Vision-Language Models

    Authors: Jia** Zhang, Ge Wang, Mannudeep K. Kalra, **kun Yan

    Abstract: In medical image analysis, the expertise scarcity and the high cost of data annotation limits the development of large artificial intelligence models. This paper investigates the potential of transfer learning with pre-trained vision-language models (VLMs) in this domain. Currently, VLMs still struggle to transfer to the underrepresented diseases with minimal presence and new diseases entirely abs… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: Early Accepted by MICCAI 2024

  47. arXiv:2405.14629  [pdf, other

    cs.LG cs.AI

    Which Experiences Are Influential for RL Agents? Efficiently Estimating The Influence of Experiences

    Authors: Takuya Hiraoka, Guanquan Wang, Takashi Onishi, Yoshimasa Tsuruoka

    Abstract: In reinforcement learning (RL) with experience replay, experiences stored in a replay buffer influence the RL agent's performance. Information about the influence of these experiences is valuable for various purposes, such as identifying experiences that negatively influence poorly performing RL agents. One method for estimating the influence of experiences is the leave-one-out (LOO) method. Howev… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: Source code: https://github.com/TakuyaHiraoka/Which-Experiences-Are-Influential-for-RL-Agents

  48. arXiv:2405.13325  [pdf, other

    cs.CL cs.AI cs.IR

    DEGAP: Dual Event-Guided Adaptive Prefixes for Templated-Based Event Argument Extraction with Slot Querying

    Authors: Guanghui Wang, Dexi Liu, Jian-Yun Nie, Qizhi Wan, Rong Hu, Xi** Liu, Wanlong Liu, Jiaming Liu

    Abstract: Recent advancements in event argument extraction (EAE) involve incorporating useful auxiliary information into models during training and inference, such as retrieved instances and event templates. These methods face two challenges: (1) the retrieval results may be irrelevant and (2) templates are developed independently for each event without considering their possible relationship. In this work,… ▽ More

    Submitted 15 June, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

  49. arXiv:2405.12474  [pdf, other

    cs.LG cs.SI

    How Universal Polynomial Bases Enhance Spectral Graph Neural Networks: Heterophily, Over-smoothing, and Over-squashing

    Authors: Keke Huang, Yu Guang Wang, Ming Li, and Pietro LiĆ²

    Abstract: Spectral Graph Neural Networks (GNNs), alternatively known as graph filters, have gained increasing prevalence for heterophily graphs. Optimal graph filters rely on Laplacian eigendecomposition for Fourier transform. In an attempt to avert prohibitive computations, numerous polynomial filters have been proposed. However, polynomials in the majority of these filters are predefined and remain fixed… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2311.18177

  50. arXiv:2405.12218  [pdf, other

    cs.CV

    Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo

    Authors: Tianqi Liu, Guangcong Wang, Shoukang Hu, Liao Shen, Xinyi Ye, Yuhang Zang, Zhiguo Cao, Wei Li, Ziwei Liu

    Abstract: We present MVSGaussian, a new generalizable 3D Gaussian representation approach derived from Multi-View Stereo (MVS) that can efficiently reconstruct unseen scenes. Specifically, 1) we leverage MVS to encode geometry-aware Gaussian representations and decode them into Gaussian parameters. 2) To further enhance performance, we propose a hybrid Gaussian rendering that integrates an efficient volume… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: Project page: https://mvsgaussian.github.io/