Skip to main content

Showing 1–50 of 161 results for author: Zou, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18007  [pdf, other

    cs.MM

    Deep Mamba Multi-modal Learning

    Authors: Jian Zhu, Xin Zou, Yu Cui, Zhangmin Huang, Chenshu Hu, Bo Lyu

    Abstract: Inspired by the excellent performance of Mamba networks, we propose a novel Deep Mamba Multi-modal Learning (DMML). It can be used to achieve the fusion of multi-modal features. We apply DMML to the field of multimedia retrieval and propose an innovative Deep Mamba Multi-modal Hashing (DMMH) method. It combines the advantages of algorithm accuracy and inference speed. We validated the effectivenes… ▽ More

    Submitted 9 April, 2024; originally announced June 2024.

    Comments: Deep Mamba Multi-modal Learning; Deep Mamba Multi-modal Hashing

  2. arXiv:2406.17952  [pdf, other

    cs.LG cs.CG

    LINSCAN -- A Linearity Based Clustering Algorithm

    Authors: Andrew Dennehy, Xiaoyu Zou, Shabnam J. Semnani, Yuri Fialko, Alexander Cloninger

    Abstract: DBSCAN and OPTICS are powerful algorithms for identifying clusters of points in domains where few assumptions can be made about the structure of the data. In this paper, we leverage these strengths and introduce a new algorithm, LINSCAN, designed to seek lineated clusters that are difficult to find and isolate with existing methods. In particular, by embedding points as normal distributions approx… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  3. arXiv:2406.16020  [pdf, other

    cs.SD cs.CL eess.AS

    AudioBench: A Universal Benchmark for Audio Large Language Models

    Authors: Bin Wang, Xunlong Zou, Geyu Lin, Shuo Sun, Zhuohan Liu, Wenyu Zhang, Zhengyuan Liu, AiTi Aw, Nancy F. Chen

    Abstract: We introduce AudioBench, a new benchmark designed to evaluate audio large language models (AudioLLMs). AudioBench encompasses 8 distinct tasks and 26 carefully selected or newly curated datasets, focusing on speech understanding, voice interpretation, and audio scene understanding. Despite the rapid advancement of large language models, including multimodal versions, a significant gap exists in co… ▽ More

    Submitted 25 June, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

    Comments: 20 pages; v2 - typo update; Code: https://github.com/AudioLLMs/AudioBench

  4. arXiv:2406.14069  [pdf, other

    eess.IV cs.CV

    Towards Multi-modality Fusion and Prototype-based Feature Refinement for Clinically Significant Prostate Cancer Classification in Transrectal Ultrasound

    Authors: Hong Wu, Juan Fu, Hongsheng Ye, Yuming Zhong, Xuebin Zou, Jianhua Zhou, Yi Wang

    Abstract: Prostate cancer is a highly prevalent cancer and ranks as the second leading cause of cancer-related deaths in men globally. Recently, the utilization of multi-modality transrectal ultrasound (TRUS) has gained significant traction as a valuable technique for guiding prostate biopsies. In this study, we propose a novel learning framework for clinically significant prostate cancer (csPCa) classifica… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  5. arXiv:2406.12018  [pdf, other

    cs.CL

    CItruS: Chunked Instruction-aware State Eviction for Long Sequence Modeling

    Authors: Yu Bai, Xiyuan Zou, Heyan Huang, Sanxing Chen, Marc-Antoine Rondeau, Yang Gao, Jackie Chi Kit Cheung

    Abstract: Long sequence modeling has gained broad interest as large language models (LLMs) continue to advance. Recent research has identified that a large portion of hidden states within the key-value caches of Transformer models can be discarded (also termed evicted) without affecting the perplexity performance in generating long sequences. However, we show that these methods, despite preserving perplexit… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Work in progress

  6. arXiv:2406.06887  [pdf, other

    cs.CL cs.AI cs.LG cs.PL cs.SE

    PLUM: Preference Learning Plus Test Cases Yields Better Code Language Models

    Authors: Dylan Zhang, Shizhe Diao, Xueyan Zou, Hao Peng

    Abstract: Instruction-finetuned code language models (LMs) have shown promise in various programming tasks. They are trained, using a language modeling objective, on natural language instructions and gold code snippet pairs. Recent evidence suggests that these models, never exposed to incorrect solutions during training, often struggle to distinguish between correct and incorrect solutions. This observation… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  7. arXiv:2405.18991  [pdf, other

    cs.CV cs.CL cs.MM

    EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture

    Authors: Jiaqi Xu, Xinyi Zou, Kunzhe Huang, Yunkuo Chen, Bo Liu, MengLi Cheng, Xing Shi, Jun Huang

    Abstract: This paper presents EasyAnimate, an advanced method for video generation that leverages the power of transformer architecture for high-performance outcomes. We have expanded the DiT framework originally designed for 2D image synthesis to accommodate the complexities of 3D video generation by incorporating a motion module block. It is used to capture temporal dynamics, thereby ensuring the producti… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 6 pages, 5 figures

  8. arXiv:2405.14135  [pdf, other

    cs.LG cs.AI

    Learning Geospatial Region Embedding with Heterogeneous Graph

    Authors: Xingchen Zou, Jiani Huang, Xixuan Hao, Yuhao Yang, Haomin Wen, Yibo Yan, Chao Huang, Yuxuan Liang

    Abstract: Learning effective geospatial embeddings is crucial for a series of geospatial applications such as city analytics and earth monitoring. However, learning comprehensive region representations presents two significant challenges: first, the deficiency of effective intra-region feature representation; and second, the difficulty of learning from intricate inter-region dependencies. In this paper, we… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  9. arXiv:2405.01204  [pdf, other

    cs.CV cs.AI

    Towards Cross-Scale Attention and Surface Supervision for Fractured Bone Segmentation in CT

    Authors: Yu Zhou, Xiahao Zou, Yi Wang

    Abstract: Bone segmentation is an essential step for the preoperative planning of fracture trauma surgery. The automated segmentation of fractured bone from computed tomography (CT) scans remains challenging, due to the large differences of fractures in position and morphology, and also the inherent anatomical characteristics of different bone structures. To alleviate these issues, we propose a cross-scale… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  10. arXiv:2404.00727  [pdf, other

    cs.CL

    A Controlled Reevaluation of Coreference Resolution Models

    Authors: Ian Porada, Xiyuan Zou, Jackie Chi Kit Cheung

    Abstract: All state-of-the-art coreference resolution (CR) models involve finetuning a pretrained language model. Whether the superior performance of one CR model over another is due to the choice of language model or other factors, such as the task-specific architecture, is difficult or impossible to determine due to lack of a standardized experimental setup. To resolve this ambiguity, we systematically ev… ▽ More

    Submitted 22 April, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

    Comments: LREC-COLING 2024

  11. arXiv:2403.19980  [pdf, other

    cs.CV

    A Parallel Attention Network for Cattle Face Recognition

    Authors: Jiayu Li, Xuechao Zou, Shiying Wang, Ben Chen, Junliang Xing, Pin Tao

    Abstract: Cattle face recognition holds paramount significance in domains such as animal husbandry and behavioral research. Despite significant progress in confined environments, applying these accomplishments in wild settings remains challenging. Thus, we create the first large-scale cattle face recognition dataset, ICRWE, for wild environments. It encompasses 483 cattle and 9,816 high-resolution image sam… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: Accepted by ICME 2024

  12. Coverage-Guaranteed Prediction Sets for Out-of-Distribution Data

    Authors: Xin Zou, Weiwei Liu

    Abstract: Out-of-distribution (OOD) generalization has attracted increasing research attention in recent years, due to its promising experimental results in real-world applications. In this paper,we study the confidence set prediction problem in the OOD generalization setting. Split conformal prediction (SCP) is an efficient framework for handling the confidence set prediction problem. However, the validity… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Journal ref: AAAI (2024) Vol. 38, No. 15, pages 17263-17270

  13. arXiv:2403.14135  [pdf, other

    eess.IV cs.CV

    Powerful Lossy Compression for Noisy Images

    Authors: Shilv Cai, Xiaoguo Liang, Shuning Cao, Luxin Yan, Sheng Zhong, Liqun Chen, Xu Zou

    Abstract: Image compression and denoising represent fundamental challenges in image processing with many real-world applications. To address practical demands, current solutions can be categorized into two main strategies: 1) sequential method; and 2) joint method. However, sequential methods have the disadvantage of error accumulation as there is information loss between multiple individual models. Recentl… ▽ More

    Submitted 26 March, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

    Comments: Accepted by ICME 2024

  14. arXiv:2403.11373  [pdf, other

    cs.CV

    Reconstruct before Query: Continual Missing Modality Learning with Decomposed Prompt Collaboration

    Authors: Shu Zhao, Xiaohan Zou, Tan Yu, Huijuan Xu

    Abstract: Pre-trained large multi-modal models (LMMs) exploit fine-tuning to adapt diverse user applications. Nevertheless, fine-tuning may face challenges due to deactivated sensors (e.g., cameras turned off for privacy or technical issues), yielding modality-incomplete data and leading to inconsistency in training data and the data for inference. Additionally, continuous training leads to catastrophic for… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  15. arXiv:2403.10920  [pdf, other

    cs.CR

    Batch-oriented Element-wise Approximate Activation for Privacy-Preserving Neural Networks

    Authors: Peng Zhang, Ao Duan, Xianglu Zou, Yuhong Liu

    Abstract: Privacy-Preserving Neural Networks (PPNN) are advanced to perform inference without breaching user privacy, which can serve as an essential tool for medical diagnosis to simultaneously achieve big data utility and privacy protection. As one of the key techniques to enable PPNN, Fully Homomorphic Encryption (FHE) is facing a great challenge that homomorphic operations cannot be easily adapted for n… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

  16. arXiv:2403.08572  [pdf, other

    cs.LG

    Caformer: Rethinking Time Series Analysis from Causal Perspective

    Authors: Kexuan Zhang, Xiaobei Zou, Yang Tang

    Abstract: Time series analysis is a vital task with broad applications in various domains. However, effectively capturing cross-dimension and cross-time dependencies in non-stationary time series poses significant challenges, particularly in the context of environmental factors. The spurious correlation induced by the environment confounds the causal relationships between cross-dimension and cross-time depe… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  17. arXiv:2403.02601  [pdf, other

    eess.IV cs.CV

    Low-Res Leads the Way: Improving Generalization for Super-Resolution by Self-Supervised Learning

    Authors: Haoyu Chen, Wenbo Li, **** Gu, **g**g Ren, Haoze Sun, Xueyi Zou, Zhensong Zhang, Youliang Yan, Lei Zhu

    Abstract: For image super-resolution (SR), bridging the gap between the performance on synthetic datasets and real-world degradation scenarios remains a challenge. This work introduces a novel "Low-Res Leads the Way" (LWay) training framework, merging Supervised Pre-training with Self-supervised Learning to enhance the adaptability of SR models to real-world images. Our approach utilizes a low-resolution (L… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR 2024

  18. arXiv:2402.19348  [pdf, other

    cs.LG cs.AI

    Deep Learning for Cross-Domain Data Fusion in Urban Computing: Taxonomy, Advances, and Outlook

    Authors: Xingchen Zou, Yibo Yan, Xixuan Hao, Yuehong Hu, Haomin Wen, Erdong Liu, Junbo Zhang, Yong Li, Tianrui Li, Yu Zheng, Yuxuan Liang

    Abstract: As cities continue to burgeon, Urban Computing emerges as a pivotal discipline for sustainable development by harnessing the power of cross-domain data fusion from diverse sources (e.g., geographical, traffic, social media, and environmental data) and modalities (e.g., spatio-temporal, visual, and textual modalities). Recently, we are witnessing a rising trend that utilizes various deep-learning m… ▽ More

    Submitted 16 June, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

  19. arXiv:2402.15758  [pdf, other

    cs.CL cs.AI

    Chimera: A Lossless Decoding Method for Accelerating Large Language Models Inference by Fusing all Tokens

    Authors: Ziqian Zeng, Jiahong Yu, Qianshi Pang, Zihao Wang, Hui** Zhuang, Hongen Shao, Xiaofeng Zou

    Abstract: Large language models (LLMs) have demonstrated remarkable capabilities across various tasks. However, their widespread application is hindered by the resource-intensive decoding process. To address this challenge, current approaches have incorporated additional decoding heads to enable parallel prediction of multiple subsequent tokens, thereby achieving inference acceleration. Nevertheless, the ac… ▽ More

    Submitted 18 April, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

  20. arXiv:2402.14857  [pdf, other

    cs.CL cs.AI cs.CR

    Is the System Message Really Important to Jailbreaks in Large Language Models?

    Authors: Xiaotian Zou, Yongkang Chen, Ke Li

    Abstract: The rapid evolution of Large Language Models (LLMs) has rendered them indispensable in modern society. While security measures are typically to align LLMs with human values prior to release, recent studies have unveiled a concerning phenomenon named "Jailbreak". This term refers to the unexpected and potentially harmful responses generated by LLMs when prompted with malicious questions. Most exist… ▽ More

    Submitted 18 June, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: 13 pages,3 figures

  21. arXiv:2401.14427  [pdf, other

    cs.SE cs.CR cs.LG

    Beimingwu: A Learnware Dock System

    Authors: Zhi-Hao Tan, Jian-Dong Liu, Xiao-Dong Bi, Peng Tan, Qin-Cheng Zheng, Hai-Tian Liu, Yi Xie, Xiao-Chuan Zou, Yang Yu, Zhi-Hua Zhou

    Abstract: The learnware paradigm proposed by Zhou [2016] aims to enable users to reuse numerous existing well-trained models instead of building machine learning models from scratch, with the hope of solving new user tasks even beyond models' original purposes. In this paradigm, developers worldwide can submit their high-performing models spontaneously to the learnware dock system (formerly known as learnwa… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

  22. arXiv:2401.06715  [pdf, other

    cs.CL cs.AI

    Reframing Tax Law Entailment as Analogical Reasoning

    Authors: Xinrui Zou, Ming Zhang, Nathaniel Weir, Benjamin Van Durme, Nils Holzenberger

    Abstract: Statutory reasoning refers to the application of legislative provisions to a series of case facts described in natural language. We re-frame statutory reasoning as an analogy task, where each instance of the analogy task involves a combination of two instances of statutory reasoning. This increases the dataset size by two orders of magnitude, and introduces an element of interpretability. We show… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

  23. arXiv:2312.12236  [pdf, ps, other

    cs.LG cs.IT math.ST

    Generalization Analysis of Machine Learning Algorithms via the Worst-Case Data-Generating Probability Measure

    Authors: Xinying Zou, Samir M. Perlaza, IƱaki Esnaola, Eitan Altman

    Abstract: In this paper, the worst-case probability measure over the data is introduced as a tool for characterizing the generalization capabilities of machine learning algorithms. More specifically, the worst-case probability measure is a Gibbs probability measure and the unique solution to the maximization of the expected loss under a relative entropy constraint with respect to a reference probability mea… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: To appear in the Proceedings of the AAAI Conference on Artificial Intelligence (7 + 2 pages)

    Report number: INRIA Technical Report RR-9515

  24. arXiv:2312.07532  [pdf, other

    cs.CV cs.AI cs.CL

    Interfacing Foundation Models' Embeddings

    Authors: Xueyan Zou, Linjie Li, Jianfeng Wang, Jianwei Yang, Mingyu Ding, Zhengyuan Yang, Feng Li, Hao Zhang, Shilong Liu, Arul Aravinthan, Yong Jae Lee, Lijuan Wang

    Abstract: We present FIND, a generalized interface for aligning foundation models' embeddings. As shown in teaser figure, a lightweight transformer interface without tuning any foundation model weights is enough for a unified image (segmentation) and dataset-level (retrieval) understanding. The proposed interface has the following favorable attributes: (1) Generalizable. It applies to various tasks spanning… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

    Comments: CODE: https://github.com/UX-Decoder/FIND

  25. arXiv:2312.07141  [pdf, other

    cs.CL

    Multilingual large language models leak human stereotypes across language boundaries

    Authors: Yang Trista Cao, Anna Sotnikova, Jieyu Zhao, Linda X. Zou, Rachel Rudinger, Hal Daume III

    Abstract: Multilingual large language models have been increasingly popular for their proficiency in processing and generating text across various languages. Previous research has shown that the presence of stereotypes and biases in monolingual large language models can be attributed to the nature of their training data, which is collected from humans and reflects societal biases. Multilingual language mode… ▽ More

    Submitted 8 May, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

  26. arXiv:2312.02949  [pdf, other

    cs.CV

    LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models

    Authors: Hao Zhang, Hongyang Li, Feng Li, Tianhe Ren, Xueyan Zou, Shilong Liu, Shijia Huang, Jianfeng Gao, Lei Zhang, Chunyuan Li, Jianwei Yang

    Abstract: With the recent significant advancements in large multi-modal models (LMMs), the importance of their grounding capability in visual chat is increasingly recognized. Despite recent efforts to enable LMMs to support grounding, their capabilities for grounding and chat are usually separate, and their chat performance drops dramatically when asked to ground. The problem is the lack of a dataset for gr… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

  27. arXiv:2312.02646  [pdf, other

    cs.LG cs.AI

    SAMSGL: Series-Aligned Multi-Scale Graph Learning for Spatio-Temporal Forecasting

    Authors: Xiaobei Zou, Luolin Xiong, Yang Tang, JĆ¼rgen Kurths

    Abstract: Spatio-temporal forecasting in various domains, like traffic prediction and weather forecasting, is a challenging endeavor, primarily due to the difficulties in modeling propagation dynamics and capturing high-dimensional interactions among nodes. Despite the significant strides made by graph-based networks in spatio-temporal forecasting, there remain two pivotal factors closely related to forecas… ▽ More

    Submitted 27 May, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

    Comments: Accepted by Chaos

  28. arXiv:2311.16512  [pdf, other

    cs.CV cs.AI

    CoSeR: Bridging Image and Language for Cognitive Super-Resolution

    Authors: Haoze Sun, Wenbo Li, Jianzhuang Liu, Haoyu Chen, Ren**g Pei, Xueyi Zou, Youliang Yan, Yujiu Yang

    Abstract: Existing super-resolution (SR) models primarily focus on restoring local texture details, often neglecting the global semantic information within the scene. This oversight can lead to the omission of crucial semantic details or the introduction of inaccurate textures during the recovery process. In our work, we introduce the Cognitive Super-Resolution (CoSeR) framework, empowering SR models with t… ▽ More

    Submitted 20 December, 2023; v1 submitted 27 November, 2023; originally announced November 2023.

    Comments: Project page: https://coser-main.github.io ; GitHub repository: https://github.com/VINHYU/CoSeR

  29. arXiv:2311.13601  [pdf, other

    cs.CV cs.AI cs.LG

    Visual In-Context Prompting

    Authors: Feng Li, Qing Jiang, Hao Zhang, Tianhe Ren, Shilong Liu, Xueyan Zou, Huaizhe Xu, Hongyang Li, Chunyuan Li, Jianwei Yang, Lei Zhang, Jianfeng Gao

    Abstract: In-context prompting in large language models (LLMs) has become a prevalent approach to improve zero-shot capabilities, but this idea is less explored in the vision domain. Existing visual prompting methods focus on referring segmentation to segment the most relevant object, falling short of addressing many generic vision tasks like open-set segmentation and detection. In this paper, we introduce… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

    Comments: technical report

  30. arXiv:2311.12083  [pdf, other

    cs.CV eess.IV

    PanBench: Towards High-Resolution and High-Performance Pansharpening

    Authors: Shiying Wang, Xuechao Zou, Kai Li, Junliang Xing, Pin Tao

    Abstract: Pansharpening, a pivotal task in remote sensing, involves integrating low-resolution multispectral images with high-resolution panchromatic images to synthesize an image that is both high-resolution and retains multispectral information. These pansharpened images enhance precision in land cover classification, change detection, and environmental monitoring within remote sensing data analysis. Whil… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

    Comments: 10 pages, 5 figures

  31. arXiv:2311.05437  [pdf, other

    cs.CV cs.AI cs.CL cs.LG cs.MM

    LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

    Authors: Shilong Liu, Hao Cheng, Haotian Liu, Hao Zhang, Feng Li, Tianhe Ren, Xueyan Zou, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang, Jianfeng Gao, Chunyuan Li

    Abstract: LLaVA-Plus is a general-purpose multimodal assistant that expands the capabilities of large multimodal models. It maintains a skill repository of pre-trained vision and vision-language models and can activate relevant tools based on users' inputs to fulfill real-world tasks. LLaVA-Plus is trained on multimodal instruction-following data to acquire the ability to use tools, covering visual understa… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

    Comments: 25 pages, 25M file size. Project Page: https://llava-vl.github.io/llava-plus/

  32. arXiv:2311.05112  [pdf

    cs.CL cs.AI

    A Survey of Large Language Models in Medicine: Progress, Application, and Challenge

    Authors: Hongjian Zhou, Fenglin Liu, Boyang Gu, Xinyu Zou, **fa Huang, **ge Wu, Yiru Li, Sam S. Chen, Peilin Zhou, Junling Liu, Yining Hua, Chengfeng Mao, Chenyu You, Xian Wu, Yefeng Zheng, Lei Clifton, Zheng Li, Jiebo Luo, David A. Clifton

    Abstract: Large language models (LLMs), such as ChatGPT, have received substantial attention due to their capabilities for understanding and generating human language. While there has been a burgeoning trend in research focusing on the employment of LLMs in supporting different medical tasks (e.g., enhancing clinical diagnostics and providing medical education), a review of these efforts, particularly their… ▽ More

    Submitted 15 May, 2024; v1 submitted 8 November, 2023; originally announced November 2023.

    Comments: Preprint. Version 5. 6 figures; 14 tables; 41 pages

  33. arXiv:2311.03369  [pdf, other

    cs.LG cs.AI cs.CR

    Can We Trust the Similarity Measurement in Federated Learning?

    Authors: Zhilin Wang, Qin Hu, Xukai Zou

    Abstract: Is it secure to measure the reliability of local models by similarity in federated learning (FL)? This paper delves into an unexplored security threat concerning applying similarity metrics, such as the L_2 norm, Euclidean distance, and cosine similarity, in protecting FL. We first uncover the deficiencies of similarity metrics that high-dimensional local models, including benign and poisoned mode… ▽ More

    Submitted 20 October, 2023; originally announced November 2023.

  34. arXiv:2310.13482  [pdf, other

    cs.HC cs.MM

    HSVRS: A Virtual Reality System of the Hide-and-Seek Game to Enhance Gaze Fixation Ability for Autistic Children

    Authors: Chengyan Yu, Shihuan Wang, Dong zhang, Yingying Zhang, Chaoqun Cen, Zhixiang you, Xiaobing zou, Hongzhu Deng, Ming Li

    Abstract: Numerous children diagnosed with Autism Spectrum Disorder (ASD) exhibit abnormal eye gaze pattern in communication and social interaction. Due to the high cost of ASD interventions and a shortage of professional therapists, researchers have explored the use of virtual reality (VR) systems as a supplementary intervention for autistic children. This paper presents the design of a novel VR-based syst… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

  35. arXiv:2310.11441  [pdf, other

    cs.CV cs.AI cs.CL cs.HC

    Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V

    Authors: Jianwei Yang, Hao Zhang, Feng Li, Xueyan Zou, Chunyuan Li, Jianfeng Gao

    Abstract: We present Set-of-Mark (SoM), a new visual prompting method, to unleash the visual grounding abilities of large multimodal models (LMMs), such as GPT-4V. As illustrated in Fig. 1 (right), we employ off-the-shelf interactive segmentation models, such as SEEM/SAM, to partition an image into regions at different levels of granularity, and overlay these regions with a set of marks e.g., alphanumerics,… ▽ More

    Submitted 6 November, 2023; v1 submitted 17 October, 2023; originally announced October 2023.

  36. arXiv:2310.05180  [pdf, other

    cs.CR

    Blockchain-Envisioned UAV-Aided Disaster Relief Networks: Challenges and Solutions

    Authors: Yuntao Wang, Qinnan Hu, Zhendong Li, Zhou Su, Ruidong Li, Xiang Zou, Jian Zhou

    Abstract: Natural or man-made disasters pose significant challenges for delivering critical relief to affected populations due to disruptions in critical infrastructures and logistics networks. Unmanned aerial vehicles (UAVs)-aided disaster relief networks (UDRNs) leverage UAVs to assist existing ground relief networks by swiftly assessing affected areas and timely delivering lifesaving supplies. To meet th… ▽ More

    Submitted 24 May, 2024; v1 submitted 8 October, 2023; originally announced October 2023.

    Comments: 8 pages

  37. arXiv:2310.04672  [pdf, other

    cs.CV

    EasyPhoto: Your Smart AI Photo Generator

    Authors: Ziheng Wu, Jiaqi Xu, Xinyi Zou, Kunzhe Huang, Xing Shi, Jun Huang

    Abstract: Stable Diffusion web UI (SD-WebUI) is a comprehensive project that provides a browser interface based on Gradio library for Stable Diffusion models. In this paper, We propose a novel WebUI plugin called EasyPhoto, which enables the generation of AI portraits. By training a digital doppelganger of a specific user ID using 5 to 20 relevant images, the finetuned model (according to the trained LoRA m… ▽ More

    Submitted 6 October, 2023; originally announced October 2023.

    Comments: 7 pages, 7 figures

  38. arXiv:2310.02887  [pdf, other

    cs.CV

    A Grammatical Compositional Model for Video Action Detection

    Authors: Zhijun Zhang, Xu Zou, Jiahuan Zhou, Sheng Zhong, Ying Wu

    Abstract: Analysis of human actions in videos demands understanding complex human dynamics, as well as the interaction between actors and context. However, these interaction relationships usually exhibit large intra-class variations from diverse human poses or object manipulations, and fine-grained inter-class differences between similar actions. Thus the performance of existing methods is severely limited.… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

  39. arXiv:2309.15367  [pdf

    cs.RO eess.SY

    Analysis on Multi-robot Relative 6-DOF Pose Estimation Error Based on UWB Range

    Authors: Xinran Li, Shuaikang Zheng, Pengcheng Zheng, Haifeng Zhang, Zhitian Li, Xudong Zou

    Abstract: Relative pose estimation is the foundational requirement for multi-robot system, while it is a challenging research topic in infrastructure-free scenes. In this study, we analyze the relative 6-DOF pose estimation error of multi-robot system in GNSS-denied and anchor-free environment. An analytical lower bound of position and orientation estimation error is given under the assumption that distance… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

    Comments: 7 pages, 9 figures

  40. arXiv:2309.14660  [pdf, other

    cs.CV cs.AI cs.RO

    CoFiI2P: Coarse-to-Fine Correspondences for Image-to-Point Cloud Registration

    Authors: Shuhao Kang, Youqi Liao, Jian** Li, Fuxun Liang, Yuhao Li, Xianghong Zou, Fangning Li, Xieyuanli Chen, Zhen Dong, Bisheng Yang

    Abstract: Image-to-point cloud (I2P) registration is a fundamental task for robots and autonomous vehicles to achieve cross-modality data fusion and localization. Existing I2P registration methods estimate correspondences at the point/pixel level, often overlooking global alignment. However, I2P matching can easily converge to a local optimum when performed without high-level guidance from global constraint… ▽ More

    Submitted 14 May, 2024; v1 submitted 26 September, 2023; originally announced September 2023.

    Comments: Submitted to IEEE RA-L (under review); project page is available at: https://whu-usi3dv.github.io/CoFiI2P

  41. arXiv:2309.05534  [pdf, other

    cs.CL cs.AI cs.CV

    PAI-Diffusion: Constructing and Serving a Family of Open Chinese Diffusion Models for Text-to-image Synthesis on the Cloud

    Authors: Chengyu Wang, Zhongjie Duan, Bingyan Liu, Xinyi Zou, Cen Chen, Kui Jia, Jun Huang

    Abstract: Text-to-image synthesis for the Chinese language poses unique challenges due to its large vocabulary size, and intricate character relationships. While existing diffusion models have shown promise in generating images from textual descriptions, they often neglect domain-specific contexts and lack robustness in handling the Chinese language. This paper introduces PAI-Diffusion, a comprehensive fram… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

  42. arXiv:2309.00314  [pdf, other

    cs.CV

    ARFA: An Asymmetric Receptive Field Autoencoder Model for Spatiotemporal Prediction

    Authors: Wenxuan Zhang, Xuechao Zou, Li Wu, Xiaoying Wang, Jianqiang Huang, Junliang Xing

    Abstract: Spatiotemporal prediction aims to generate future sequences by paradigms learned from historical contexts. It is essential in numerous domains, such as traffic flow prediction and weather forecasting. Recently, research in this field has been predominantly driven by deep neural networks based on autoencoder architectures. However, existing methods commonly adopt autoencoder architectures with iden… ▽ More

    Submitted 8 January, 2024; v1 submitted 1 September, 2023; originally announced September 2023.

    Comments: Accepted by ICASSP 2024

  43. arXiv:2308.13323  [pdf, other

    cs.CV cs.RO

    SVQNet: Sparse Voxel-Adjacent Query Network for 4D Spatio-Temporal LiDAR Semantic Segmentation

    Authors: Xuechao Chen, Shuangjie Xu, Xiaoyi Zou, Tongyi Cao, Dit-Yan Yeung, Lu Fang

    Abstract: LiDAR-based semantic perception tasks are critical yet challenging for autonomous driving. Due to the motion of objects and static/dynamic occlusion, temporal information plays an essential role in reinforcing perception by enhancing and completing single-frame knowledge. Previous approaches either directly stack historical frames to the current frame or build a 4D spatio-temporal neighborhood usi… ▽ More

    Submitted 25 August, 2023; originally announced August 2023.

    Comments: Received by ICCV2023

  44. arXiv:2308.11985  [pdf, other

    cs.DC cs.PF

    DSSP: A Distributed, SLO-aware, Sensing-domain-privacy-Preserving Architecture for Sensing-as-a-Service

    Authors: Lin Sun, Todd Rosenkrantz, Prathyusha Enganti, Huiyang Li, Zhijun Wang, Hao Che, Hong Jiang, Xukai Zou

    Abstract: In this paper, we propose DSSP, a Distributed, SLO-aware, Sensing-domain-privacy-Preserving architecture for Sensing-as-a-Service (SaS). DSSP addresses four major limitations of the current SaS architecture. First, to improve sensing quality and enhance geographic coverage, DSSP allows Independent sensing Administrative Domains (IADs) to participate in sensing services, while preserving the autono… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

    Comments: 14 pages

    ACM Class: C.2.4; C.4

  45. arXiv:2308.11978  [pdf, other

    cs.LG cs.AI q-bio.BM stat.ML

    Will More Expressive Graph Neural Networks do Better on Generative Tasks?

    Authors: Xiandong Zou, Xiangyu Zhao, Pietro LiĆ², Yiren Zhao

    Abstract: Graph generation poses a significant challenge as it involves predicting a complete graph with multiple nodes and edges based on simply a given label. This task also carries fundamental importance to numerous real-world applications, including de-novo drug and molecular design. In recent years, several successful methods have emerged in the field of graph generation. However, these approaches suff… ▽ More

    Submitted 20 February, 2024; v1 submitted 23 August, 2023; originally announced August 2023.

    Comments: 2nd Learning on Graphs Conference (LoG 2023). 26 pages, 5 figures, 11 tables

  46. arXiv:2308.08443  [pdf, other

    cs.CV

    High-Fidelity Lake Extraction via Two-Stage Prompt Enhancement: Establishing a Novel Baseline and Benchmark

    Authors: Ben Chen, Xuechao Zou, Kai Li, Yu Zhang, Junliang Xing, Pin Tao

    Abstract: Lake extraction from remote sensing imagery is a complex challenge due to the varied lake shapes and data noise. Current methods rely on multispectral image datasets, making it challenging to learn lake features accurately from pixel arrangements. This, in turn, affects model learning and the creation of accurate segmentation masks. This paper introduces a prompt-based dataset construction approac… ▽ More

    Submitted 31 March, 2024; v1 submitted 16 August, 2023; originally announced August 2023.

    Comments: Accepted by ICME 2024

  47. arXiv:2308.06596  [pdf, other

    cs.IT cs.NI

    On the Performance Trade-off of Distributed Integrated Sensing and Communication Networks

    Authors: Xuran Li, Shuaishuai Guo, Tuo Li, Xiaofeng Zou, Dengwang Li

    Abstract: In this letter, we analyze the performance trade-off in distributed integrated sensing and communication (ISAC) networks. Specifically, with the aid of stochastic geometry theory, we derive the probability of detection of that of the coverage given user number. Based on the analytical derivations, we provide a quantitative description of the performance limits and the performance trade-off between… ▽ More

    Submitted 12 August, 2023; originally announced August 2023.

  48. arXiv:2308.04417  [pdf, other

    cs.CV cs.LG eess.IV

    DiffCR: A Fast Conditional Diffusion Framework for Cloud Removal from Optical Satellite Images

    Authors: Xuechao Zou, Kai Li, Junliang Xing, Yu Zhang, Shiying Wang, Lei **, Pin Tao

    Abstract: Optical satellite images are a critical data source; however, cloud cover often compromises their quality, hindering image applications and analysis. Consequently, effectively removing clouds from optical satellite images has emerged as a prominent research direction. While recent advancements in cloud removal primarily rely on generative adversarial networks, which may yield suboptimal image qual… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

    Comments: 13 pages, 7 figures

  49. arXiv:2308.04397  [pdf, other

    cs.CV

    LEFormer: A Hybrid CNN-Transformer Architecture for Accurate Lake Extraction from Remote Sensing Imagery

    Authors: Ben Chen, Xuechao Zou, Yu Zhang, Jiayu Li, Kai Li, Junliang Xing, Pin Tao

    Abstract: Lake extraction from remote sensing images is challenging due to the complex lake shapes and inherent data noises. Existing methods suffer from blurred segmentation boundaries and poor foreground modeling. This paper proposes a hybrid CNN-Transformer architecture, called LEFormer, for accurate lake extraction. LEFormer contains three main modules: CNN encoder, Transformer encoder, and cross-encode… ▽ More

    Submitted 8 January, 2024; v1 submitted 8 August, 2023; originally announced August 2023.

    Comments: Accepted by ICASSP 2024

  50. arXiv:2307.13953  [pdf, other

    cs.CV cs.SD eess.AS

    The Hidden Dance of Phonemes and Visage: Unveiling the Enigmatic Link between Phonemes and Facial Features

    Authors: Liao Qu, Xianwei Zou, Xiang Li, Yandong Wen, Rita Singh, Bhiksha Raj

    Abstract: This work unveils the enigmatic link between phonemes and facial features. Traditional studies on voice-face correlations typically involve using a long period of voice input, including generating face images from voices and reconstructing 3D face meshes from voices. However, in situations like voice-based crimes, the available voice evidence may be short and limited. Additionally, from a physiolo… ▽ More

    Submitted 26 July, 2023; originally announced July 2023.

    Comments: Interspeech 2023