Skip to main content

Showing 1–50 of 1,256 results for author: Zhou, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01292  [pdf, other

    cs.RO

    Preserving Relative Localization of FoV-Limited Drone Swarm via Active Mutual Observation

    Authors: Lianjie Guo, Zaitian Gongye, Ziyi Xu, Yingjian Wang, Xin Zhou, **ni Zhou, Fei Gao

    Abstract: Relative state estimation is crucial for vision-based swarms to estimate and compensate for the unavoidable drift of visual odometry. For autonomous drones equipped with the most compact sensor setting -- a stereo camera that provides a limited field of view (FoV), the demand for mutual observation for relative state estimation conflicts with the demand for environment observation. To balance the… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Accepted by IROS 2024, 8 pages, 10 figures

  2. arXiv:2407.00100  [pdf, other

    cs.LG cs.AI cs.CL

    Enhancing In-Context Learning via Implicit Demonstration Augmentation

    Authors: Xiaoling Zhou, Wei Ye, Yidong Wang, Chaoya Jiang, Zhemg Lee, Rui Xie, Shikun Zhang

    Abstract: The emergence of in-context learning (ICL) enables large pre-trained language models (PLMs) to make predictions for unseen inputs without updating parameters. Despite its potential, ICL's effectiveness heavily relies on the quality, quantity, and permutation of demonstrations, commonly leading to suboptimal and unstable performance. In this paper, we tackle this challenge for the first time from t… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

    Comments: Accepted by ACL 2024 Main 19 pages,10 figures

    ACM Class: I.2.7

  3. arXiv:2406.19749  [pdf, other

    eess.IV cs.CV

    SPIRONet: Spatial-Frequency Learning and Topological Channel Interaction Network for Vessel Segmentation

    Authors: De-Xing Huang, Xiao-Hu Zhou, Xiao-Liang Xie, Shi-Qi Liu, Shuang-Yi Wang, Zhen-Qiu Feng, Mei-Jiang Gui, Hao Li, Tian-Yu Xiang, Bo-Xian Yao, Zeng-Guang Hou

    Abstract: Automatic vessel segmentation is paramount for develo** next-generation interventional navigation systems. However, current approaches suffer from suboptimal segmentation performances due to significant challenges in intraoperative images (i.e., low signal-to-noise ratio, small or slender vessels, and strong interference). In this paper, a novel spatial-frequency learning and topological channel… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  4. arXiv:2406.19354  [pdf, other

    cs.CL cs.AI

    Fundamental Problems With Model Editing: How Should Rational Belief Revision Work in LLMs?

    Authors: Peter Hase, Thomas Hofweber, Xiang Zhou, Elias Stengel-Eskin, Mohit Bansal

    Abstract: The model editing problem concerns how language models should learn new facts about the world over time. While empirical research on model editing has drawn widespread attention, the conceptual foundations of model editing remain shaky -- perhaps unsurprisingly, since model editing is essentially belief revision, a storied problem in philosophy that has eluded succinct solutions for decades. Model… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 23 pages, 4 figures

  5. arXiv:2406.18962  [pdf, other

    cs.IR

    Multi-modal Food Recommendation using Clustering and Self-supervised Learning

    Authors: Yixin Zhang, Xin Zhou, Qianwen Meng, Fanglin Zhu, Yonghui Xu, Zhiqi Shen, Lizhen Cui

    Abstract: Food recommendation systems serve as pivotal components in the realm of digital lifestyle services, designed to assist users in discovering recipes and food items that resonate with their unique dietary predilections. Typically, multi-modal descriptions offer an exhaustive profile for each recipe, thereby ensuring recommendations that are both personalized and accurate. Our preliminary investigati… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: Working paper

  6. arXiv:2406.18961  [pdf, other

    cs.MA

    Formation Under Communication Constraints: Control Performance Meets Channel Capacity

    Authors: Yaru Chen, Yirui Cong, Xiangyun Zhou, Long Cheng, Xiangke Wang

    Abstract: In wireless communication-based formation control systems, the control performance is significantly impacted by the channel capacity of each communication link between agents. This relationship, however, remains under-investigated in the existing studies. To address this gap, the formation control problem of classical second-order multi-agent systems with bounded process noises was considered taki… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  7. arXiv:2406.17182  [pdf, other

    cs.IR cs.LG

    Debiased Recommendation with Noisy Feedback

    Authors: Haoxuan Li, Chunyuan Zheng, Wenjie Wang, Hao Wang, Fuli Feng, Xiao-Hua Zhou

    Abstract: Ratings of a user to most items in recommender systems are usually missing not at random (MNAR), largely because users are free to choose which items to rate. To achieve unbiased learning of the prediction model under MNAR data, three typical solutions have been proposed, including error-imputation-based (EIB), inverse-propensity-scoring (IPS), and doubly robust (DR) methods. However, these method… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: KDD 24 Research Track Paper

  8. arXiv:2406.15758  [pdf, other

    cs.LG cs.DC

    EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Layerwise Unified Compression and Adaptive Layer Tuning and Voting

    Authors: Zhongzhi Yu, Zheng Wang, Yuhan Li, Haoran You, Ruijie Gao, Xiaoya Zhou, Sreenidhi Reedy Bommu, Yang Katie Zhao, Yingyan Celine Lin

    Abstract: Efficient adaption of large language models (LLMs) on edge devices is essential for applications requiring continuous and privacy-preserving adaptation and inference. However, existing tuning techniques fall short because of the high computation and memory overheads. To this end, we introduce a computation- and memory-efficient LLM tuning framework, called Edge-LLM, to facilitate affordable and ef… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  9. arXiv:2406.15222  [pdf

    eess.IV cs.AI cs.CV

    Rapid and Accurate Diagnosis of Acute Aortic Syndrome using Non-contrast CT: A Large-scale, Retrospective, Multi-center and AI-based Study

    Authors: Yujian Hu, Yilang Xiang, Yan-Jie Zhou, Yangyan He, Shifeng Yang, Xiaolong Du, Chunlan Den, Youyao Xu, Gaofeng Wang, Zhengyao Ding, **gyong Huang, Wenjun Zhao, Xuejun Wu, Donglin Li, Qianqian Zhu, Zhenjiang Li, Chenyang Qiu, Ziheng Wu, Yunjun He, Chen Tian, Yihui Qiu, Zuodong Lin, Xiaolong Zhang, Yuan He, Zhenpeng Yuan , et al. (15 additional authors not shown)

    Abstract: Chest pain symptoms are highly prevalent in emergency departments (EDs), where acute aortic syndrome (AAS) is a catastrophic cardiovascular emergency with a high fatality rate, especially when timely and accurate treatment is not administered. However, current triage practices in the ED can cause up to approximately half of patients with AAS to have an initially missed diagnosis or be misdiagnosed… ▽ More

    Submitted 24 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: under peer review

  10. arXiv:2406.14098  [pdf, ps, other

    cs.CV

    HeartBeat: Towards Controllable Echocardiography Video Synthesis with Multimodal Conditions-Guided Diffusion Models

    Authors: Xinrui Zhou, Yuhao Huang, Wufeng Xue, Haoran Dou, Jun Cheng, Han Zhou, Dong Ni

    Abstract: Echocardiography (ECHO) video is widely used for cardiac examination. In clinical, this procedure heavily relies on operator experience, which needs years of training and maybe the assistance of deep learning-based systems for enhanced accuracy and efficiency. However, it is challenging since acquiring sufficient customized data (e.g., abnormal cases) for novice training and deep model development… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Accepted by MICCAI 2024

  11. arXiv:2406.13225  [pdf, other

    cs.LG cs.AI cs.IR

    Communication-Efficient Federated Knowledge Graph Embedding with Entity-Wise Top-K Sparsification

    Authors: Xiaoxiong Zhang, Zhiwei Zeng, Xin Zhou, Dusit Niyato, Zhiqi Shen

    Abstract: Federated Knowledge Graphs Embedding learning (FKGE) encounters challenges in communication efficiency stemming from the considerable size of parameters and extensive communication rounds. However, existing FKGE methods only focus on reducing communication rounds by conducting multiple rounds of local training in each communication round, and ignore reducing the size of parameters transmitted with… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  12. arXiv:2406.11943  [pdf, other

    cs.IR cs.AI

    Personalized Federated Knowledge Graph Embedding with Client-Wise Relation Graph

    Authors: Xiaoxiong Zhang, Zhiwei Zeng, Xin Zhou, Dusit Niyato, Zhiqi Shen

    Abstract: Federated Knowledge Graph Embedding (FKGE) has recently garnered considerable interest due to its capacity to extract expressive representations from distributed knowledge graphs, while concurrently safeguarding the privacy of individual clients. Existing FKGE methods typically harness the arithmetic mean of entity embeddings from all clients as the global supplementary knowledge, and learn a repl… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  13. arXiv:2406.10844  [pdf, other

    eess.AS cs.SD

    Multi-Scale Accent Modeling with Disentangling for Multi-Speaker Multi-Accent TTS Synthesis

    Authors: Xuehao Zhou, Mingyang Zhang, Yi Zhou, Zhizheng Wu, Haizhou Li

    Abstract: Synthesizing speech across different accents while preserving the speaker identity is essential for various real-world customer applications. However, the individual and accurate modeling of accents and speakers in a text-to-speech (TTS) system is challenging due to the complexity of accent variations and the intrinsic entanglement between the accent and speaker identity. In this paper, we present… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  14. arXiv:2406.10517  [pdf, other

    cs.IR cs.AI cs.LG

    ADSNet: Cross-Domain LTV Prediction with an Adaptive Siamese Network in Advertising

    Authors: Ruize Wang, Hui Xu, Ying Cheng, Qi He, Xing Zhou, Rui Feng, Wei Xu, Lei Huang, Jie Jiang

    Abstract: Advertising platforms have evolved in estimating Lifetime Value (LTV) to better align with advertisers' true performance metric. However, the sparsity of real-world LTV data presents a significant challenge to LTV predictive model(i.e., pLTV), severely limiting the their capabilities. Therefore, we propose to utilize external data, in addition to the internal data of advertising platform, to expan… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: Accepted to KDD 2024

  15. arXiv:2406.10347  [pdf, other

    cs.NI

    A Near-Optimal Category Information Sampling in RFID Systems

    Authors: Xiujun Wang, Zhi Liu, Xiaokang Zhou, Yong Liao, Han Hu, Xiao Zheng, Jie Li

    Abstract: In many RFID-enabled applications, objects are classified into different categories, and the information associated with each object's category (called category information) is written into the attached tag, allowing the reader to access it later. The category information sampling in such RFID systems, which is to randomly choose (sample) a few tags from each category and collect their category in… ▽ More

    Submitted 18 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: 37 pages, 11 figures

  16. arXiv:2406.10305  [pdf

    cs.SE cs.AI cs.LG

    Unlock the Correlation between Supervised Fine-Tuning and Reinforcement Learning in Training Code Large Language Models

    Authors: Jie Chen, Xintian Han, Yu Ma, Xun Zhou, Liang Xiang

    Abstract: Automatic code generation has been a longstanding research topic. With the advancement of general-purpose large language models (LLMs), the ability to code stands out as one important measure to the model's reasoning performance. Usually, a two-stage training paradigm is implemented to obtain a Code LLM, namely the pretraining and the fine-tuning. Within the fine-tuning, supervised fine-tuning (SF… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  17. arXiv:2406.08657  [pdf, other

    cs.CL

    Mistral-C2F: Coarse to Fine Actor for Analytical and Reasoning Enhancement in RLHF and Effective-Merged LLMs

    Authors: Chen Zheng, Ke Sun, Xun Zhou

    Abstract: Despite the advances in Large Language Models (LLMs), exemplified by models like GPT-4 and Claude, smaller-scale LLMs such as Llama and Mistral often struggle with generating in-depth and coherent dialogues. This paper presents a novel two-step Coarse-to-Fine Actor model to address the inherent limitations in conversational and analytical capabilities of small-sized LLMs. Our approach begins with… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  18. arXiv:2406.05130  [pdf, other

    cs.CL

    An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models

    Authors: Xiongtao Zhou, Jie He, Yuhua Ke, Guangyao Zhu, Víctor Gutiérrez-Basulto, Jeff Z. Pan

    Abstract: Multimodal large language models (MLLMs) fine-tuned with multimodal instruction datasets have demonstrated remarkable capabilities in multimodal tasks. However, fine-tuning all parameters of MLLMs has become challenging as they usually contain billions of parameters. To address this issue, we study parameter-efficient fine-tuning (PEFT) methods for MLLMs. We aim to identify effective methods for e… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: ACL finding 2024

  19. arXiv:2406.04906  [pdf, other

    cs.CV cs.AI

    RU-AI: A Large Multimodal Dataset for Machine Generated Content Detection

    Authors: Liting Huang, Zhihao Zhang, Yiran Zhang, Xiyue Zhou, Shou** Wang

    Abstract: The recent advancements in generative AI models, which can create realistic and human-like content, are significantly transforming how people communicate, create, and work. While the appropriate use of generative AI models can benefit the society, their misuse poses significant threats to data reliability and authentication. However, due to a lack of aligned multimodal datasets, effective and robu… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  20. arXiv:2406.04371  [pdf, other

    cs.CL cs.AI

    Phased Instruction Fine-Tuning for Large Language Models

    Authors: Wei Pang, Chuan Zhou, Xiao-Hua Zhou, Xiaojie Wang

    Abstract: Instruction Fine-Tuning enhances pre-trained language models from basic next-word prediction to complex instruction-following. However, existing One-off Instruction Fine-Tuning (One-off IFT) method, applied on a diverse instruction, may not effectively boost models' adherence to instructions due to the simultaneous handling of varying instruction complexities. To improve this, Phased Instruction F… ▽ More

    Submitted 16 June, 2024; v1 submitted 1 June, 2024; originally announced June 2024.

    Comments: The final version, to be appear at ACL 2024 Findings

  21. arXiv:2406.03933  [pdf, other

    cs.CR cs.IR

    Beyond Similarity: Personalized Federated Recommendation with Composite Aggregation

    Authors: Honglei Zhang, Haoxuan Li, Jundong Chen, Sen Cui, Kunda Yan, Abudukelimu Wuerkaixi, Xin Zhou, Zhiqi Shen, Yidong Li

    Abstract: Federated recommendation aims to collect global knowledge by aggregating local models from massive devices, to provide recommendations while ensuring privacy. Current methods mainly leverage aggregation functions invented by federated vision community to aggregate parameters from similar clients, e.g., clustering aggregation. Despite considerable performance, we argue that it is suboptimal to appl… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  22. arXiv:2406.02914  [pdf

    cs.CV eess.IV

    A Self-Supervised Denoising Strategy for Underwater Acoustic Camera Imageries

    Authors: Xiaoteng Zhou, Katsunori Mizuno, Yilong Zhang

    Abstract: In low-visibility marine environments characterized by turbidity and darkness, acoustic cameras serve as visual sensors capable of generating high-resolution 2D sonar images. However, acoustic camera images are interfered with by complex noise and are difficult to be directly ingested by downstream visual algorithms. This paper introduces a novel strategy for denoising acoustic camera images using… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 8 pages

  23. arXiv:2406.01884  [pdf, other

    cs.CV

    Rank-based No-reference Quality Assessment for Face Swap**

    Authors: Xinghui Zhou, Wenbo Zhou, Tianyi Wei, Shen Chen, Tai** Yao, Shouhong Ding, Weiming Zhang, Nenghai Yu

    Abstract: Face swap** has become a prominent research area in computer vision and image processing due to rapid technological advancements. The metric of measuring the quality in most face swap** methods relies on several distances between the manipulated images and the source image, or the target image, i.e., there are suitable known reference face images. Therefore, there is still a gap in accurately… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 8 pages, 5 figures

  24. arXiv:2406.01238  [pdf, other

    cs.CL

    EffiQA: Efficient Question-Answering with Strategic Multi-Model Collaboration on Knowledge Graphs

    Authors: Zixuan Dong, Baoyun Peng, Yufei Wang, Jia Fu, Xiaodong Wang, Yongxue Shan, Xin Zhou

    Abstract: While large language models (LLMs) have shown remarkable capabilities in natural language processing, they struggle with complex, multi-step reasoning tasks involving knowledge graphs (KGs). Existing approaches that integrate LLMs and KGs either underutilize the reasoning abilities of LLMs or suffer from prohibitive computational costs due to tight coupling. To address these limitations, we propos… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 10 pages, 4 figures, 3 tables

  25. arXiv:2406.01188  [pdf, other

    cs.CV

    UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation

    Authors: Xiang Wang, Shiwei Zhang, Changxin Gao, Jiayu Wang, Xiaoqiang Zhou, Yingya Zhang, Luxin Yan, Nong Sang

    Abstract: Recent diffusion-based human image animation techniques have demonstrated impressive success in synthesizing videos that faithfully follow a given reference identity and a sequence of desired movement poses. Despite this, there are still two limitations: i) an extra reference model is required to align the identity image with the main video branch, which significantly increases the optimization bu… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Project page: https://unianimate.github.io/

  26. arXiv:2405.19818  [pdf, other

    cs.CV cs.AI

    WebUOT-1M: Advancing Deep Underwater Object Tracking with A Million-Scale Benchmark

    Authors: Chunhui Zhang, Li Liu, Guanjie Huang, Hao Wen, Xi Zhou, Yanfeng Wang

    Abstract: Underwater object tracking (UOT) is a foundational task for identifying and tracing submerged entities in underwater video sequences. However, current UOT datasets suffer from limitations in scale, diversity of target categories and scenarios covered, hindering the training and evaluation of modern tracking algorithms. To bridge this gap, we take the first step and introduce WebUOT-1M, \ie, the la… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: GitHub project: https://github.com/983632847/Awesome-Multimodal-Object-Tracking

  27. arXiv:2405.19256  [pdf, other

    cs.LG math.NA

    Weak Generative Sampler to Efficiently Sample Invariant Distribution of Stochastic Differential Equation

    Authors: Zhiqiang Cai, Yu Cao, Yuanfei Huang, Xiang Zhou

    Abstract: Sampling invariant distributions from an Ito diffusion process presents a significant challenge in stochastic simulation. Traditional numerical solvers for stochastic differential equations require both a fine step size and a lengthy simulation period, resulting in both biased and correlated samples. Current deep learning-based method solves the stationary Fokker--Planck equation to determine the… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 24 pages,10 figures

  28. arXiv:2405.18723  [pdf, other

    cs.LG cs.AI

    Conformal Depression Prediction

    Authors: Yonghong Li, Shan Qu, Xiuzhuang Zhou

    Abstract: While existing depression prediction methods based on deep learning show promise, their practical application is hindered by the lack of trustworthiness, as these deep models are often deployed as \textit{black box} models, leaving us uncertain about the confidence of the model predictions. For high-risk clinical applications like depression prediction, uncertainty quantification is essential in d… ▽ More

    Submitted 30 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

  29. arXiv:2405.18657  [pdf, other

    cs.NI

    The Efficacy of the Connect America Fund in Addressing US Internet Access Inequities

    Authors: Haarika Manda, Varshika Srinivasavaradhan, Laasya Koduru, Kevin Zhang, Xuanhe Zhou, Udit Paul, Elizabeth Belding, Arpit Gupta, Tejas N. Narechania

    Abstract: Residential fixed broadband internet access in the United States (US) has long been distributed inequitably, drawing significant attention from researchers and policymakers. This paper evaluates the efficacy of the Connect America Fund (CAF), a key policy intervention aimed at addressing disparities in US internet access. CAF subsidizes the creation of new regulated broadband monopolies in underse… ▽ More

    Submitted 31 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

  30. arXiv:2405.17871  [pdf, other

    cs.CV cs.AI cs.CL

    Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment

    Authors: Xin Xiao, Bohong Wu, Jiacong Wang, Chunyuan Li, Xun Zhou, Haoyuan Guo

    Abstract: Existing image-text modality alignment in Vision Language Models (VLMs) treats each text token equally in an autoregressive manner. Despite being simple and effective, this method results in sub-optimal cross-modal alignment by over-emphasizing the text tokens that are less correlated with or even contradictory with the input images. In this paper, we advocate for assigning distinct contributions… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  31. arXiv:2405.17779  [pdf, other

    cs.LG cs.RO

    Online Analytic Exemplar-Free Continual Learning with Large Models for Imbalanced Autonomous Driving Task

    Authors: Hui** Zhuang, Di Fang, Kai Tong, Yuchen Liu, Ziqian Zeng, Xu Zhou, Cen Chen

    Abstract: In the field of autonomous driving, even a meticulously trained model can encounter failures when faced with unfamiliar sceanrios. One of these scenarios can be formulated as an online continual learning (OCL) problem. That is, data come in an online fashion, and models are updated according to these streaming data. Two major OCL challenges are catastrophic forgetting and data imbalance. To addres… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  32. arXiv:2405.17464  [pdf, other

    cs.LG cs.AI stat.ML

    Data Valuation by Leveraging Global and Local Statistical Information

    Authors: Xiaoling Zhou, Ou Wu, Michael K. Ng, Hao Jiang

    Abstract: Data valuation has garnered increasing attention in recent years, given the critical role of high-quality data in various applications, particularly in machine learning tasks. There are diverse technical avenues to quantify the value of data within a corpus. While Shapley value-based methods are among the most widely used techniques in the literature due to their solid theoretical foundation, the… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 12 pages, 8 figures. arXiv admin note: text overlap with arXiv:2306.10577 by other authors

    ACM Class: I.2

  33. arXiv:2405.16449  [pdf, other

    cs.LG math.OC q-fin.MF

    Reinforcement Learning for Jump-Diffusions

    Authors: Xuefeng Gao, Lingfei Li, Xun Yu Zhou

    Abstract: We study continuous-time reinforcement learning (RL) for stochastic control in which system dynamics are governed by jump-diffusion processes. We formulate an entropy-regularized exploratory control problem with stochastic policies to capture the exploration--exploitation balance essential for RL. Unlike the pure diffusion case initially studied by Wang et al. (2020), the derivation of the explora… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  34. arXiv:2405.16287  [pdf, other

    cs.LG

    LoGAH: Predicting 774-Million-Parameter Transformers using Graph HyperNetworks with 1/100 Parameters

    Authors: Xinyu Zhou, Boris Knyazev, Alexia Jolicoeur-Martineau, Jie Fu

    Abstract: A good initialization of deep learning models is essential since it can help them converge better and faster. However, pretraining large models is unaffordable for many researchers, which makes a desired prediction for initial parameters more necessary nowadays. Graph HyperNetworks (GHNs), one approach to predicting model parameters, have recently shown strong performance in initializing large vis… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: 16 pages

  35. MINet: Multi-scale Interactive Network for Real-time Salient Object Detection of Strip Steel Surface Defects

    Authors: Kunye Shen, Xiaofei Zhou, Zhi Liu

    Abstract: The automated surface defect detection is a fundamental task in industrial production, and the existing saliencybased works overcome the challenging scenes and give promising detection results. However, the cutting-edge efforts often suffer from large parameter size, heavy computational cost, and slow inference speed, which heavily limits the practical applications. To this end, we devise a multi-… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: accepted by IEEE Transactions on Industrial Informatics

  36. arXiv:2405.16036  [pdf, other

    cs.LG cs.CR cs.CV

    Certifying Adapters: Enabling and Enhancing the Certification of Classifier Adversarial Robustness

    Authors: Jieren Deng, Hanbin Hong, Aaron Palmer, Xin Zhou, **bo Bi, Kaleel Mahmood, Yuan Hong, Derek Aguiar

    Abstract: Randomized smoothing has become a leading method for achieving certified robustness in deep classifiers against l_{p}-norm adversarial perturbations. Current approaches for achieving certified robustness, such as data augmentation with Gaussian noise and adversarial training, require expensive training procedures that tune large models for different Gaussian noise levels and thus cannot leverage h… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  37. arXiv:2405.15304  [pdf, other

    cs.LG cs.CV

    Unlearning Concepts in Diffusion Model via Concept Domain Correction and Concept Preserving Gradient

    Authors: Yongliang Wu, Shiji Zhou, Mingzhuo Yang, Lianzhe Wang, Wenbo Zhu, Heng Chang, Xiao Zhou, Xu Yang

    Abstract: Current text-to-image diffusion models have achieved groundbreaking results in image generation tasks. However, the unavoidable inclusion of sensitive information during pre-training introduces significant risks such as copyright infringement and privacy violations in the generated images. Machine Unlearning (MU) provides a effective way to the sensitive concepts captured by the model, has been sh… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  38. arXiv:2405.14200  [pdf, other

    cs.CV cs.AI

    Awesome Multi-modal Object Tracking

    Authors: Chunhui Zhang, Li Liu, Hao Wen, Xi Zhou, Yanfeng Wang

    Abstract: Multi-modal object tracking (MMOT) is an emerging field that combines data from various modalities, \eg vision (RGB), depth, thermal infrared, event, language and audio, to estimate the state of an arbitrary object in a video sequence. It is of great significance for many applications such as autonomous driving and intelligent surveillance. In recent years, MMOT has received more and more attentio… ▽ More

    Submitted 31 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: A continuously updated project to track the latest progress in multi-modal object tracking

  39. arXiv:2405.14073  [pdf, other

    cs.LG

    PEAC: Unsupervised Pre-training for Cross-Embodiment Reinforcement Learning

    Authors: Chengyang Ying, Zhongkai Hao, Xinning Zhou, Xuezhou Xu, Hang Su, Xingxing Zhang, Jun Zhu

    Abstract: Designing generalizable agents capable of adapting to diverse embodiments has achieved significant attention in Reinforcement Learning (RL), which is critical for deploying RL agents in various real-world applications. Previous Cross-Embodiment RL approaches have focused on transferring knowledge across embodiments within specific tasks. These methods often result in knowledge tightly coupled with… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  40. arXiv:2405.13955  [pdf

    cs.HC cs.ET

    Cognitive Internet of Vulnerable Road Users in Traffic: Predictive Neural Modulations of Road Crossing Intention

    Authors: Xiaoshan Zhou, Carol C. Menassa, Vineet R. Kamat

    Abstract: Vulnerable Road Users (VRUs) present a significant challenge for road safety due to the frequent unpredictability of their behaviors. In typical Intelligent Transportation Systems, vision-based approaches supported by networked cameras are often used to anticipate VRUs motion intentions and trajectories. However, several limitations posed by occlusions and distractions set a boundary for the effic… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 33 pages, 15 figures

  41. arXiv:2405.13300   

    cs.LG cs.AI

    FAITH: Frequency-domain Attention In Two Horizons for Time Series Forecasting

    Authors: Ruiqi Li, Maowei Jiang, Kai Wang, Kaiduo Feng, Quangao Liu, Yue Sun, Xiufang Zhou

    Abstract: Time Series Forecasting plays a crucial role in various fields such as industrial equipment maintenance, meteorology, energy consumption, traffic flow and financial investment. However, despite their considerable advantages over traditional statistical approaches, current deep learning-based predictive models often exhibit a significant deviation between their forecasting outcomes and the ground t… ▽ More

    Submitted 1 July, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

    Comments: We think there are some errors in the experiment result, it may lead to a wrong conclusion. So we think it will be responsible to withdraw it

  42. arXiv:2405.12724  [pdf, other

    cs.CV

    RemoCap: Disentangled Representation Learning for Motion Capture

    Authors: Hongsheng Wang, Lizao Zhang, Zhangnan Zhong, Shuolin Xu, Xinrui Zhou, Shengyu Zhang, Huahao Xu, Fei Wu, Feng Lin

    Abstract: Reconstructing 3D human bodies from realistic motion sequences remains a challenge due to pervasive and complex occlusions. Current methods struggle to capture the dynamics of occluded body parts, leading to model penetration and distorted motion. RemoCap leverages Spatial Disentanglement (SD) and Motion Disentanglement (MD) to overcome these limitations. SD addresses occlusion interference betwee… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  43. arXiv:2405.12505  [pdf, other

    cs.CV

    NOVA-3D: Non-overlapped Views for 3D Anime Character Reconstruction

    Authors: Hongsheng Wang, Nanjie Yao, Xinrui Zhou, Shengyu Zhang, Huahao Xu, Fei Wu, Feng Lin

    Abstract: In the animation industry, 3D modelers typically rely on front and back non-overlapped concept designs to guide the 3D modeling of anime characters. However, there is currently a lack of automated approaches for generating anime characters directly from these 2D designs. In light of this, we explore a novel task of reconstructing anime characters from non-overlapped views. This presents two main c… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  44. arXiv:2405.12477  [pdf, other

    cs.CV

    Gaussian Control with Hierarchical Semantic Graphs in 3D Human Recovery

    Authors: Hongsheng Wang, Weiyue Zhang, Sihao Liu, Xinrui Zhou, **g Li, Zhanyun Tang, Shengyu Zhang, Fei Wu, Feng Lin

    Abstract: Although 3D Gaussian Splatting (3DGS) has recently made progress in 3D human reconstruction, it primarily relies on 2D pixel-level supervision, overlooking the geometric complexity and topological relationships of different body parts. To address this gap, we introduce the Hierarchical Graph Human Gaussian Control (HUGS) framework for achieving high-fidelity 3D human reconstruction. Our approach i… ▽ More

    Submitted 21 June, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

  45. arXiv:2405.11850  [pdf, other

    cs.CV

    Rethinking Overlooked Aspects in Vision-Language Models

    Authors: Yuan Liu, Le Tian, Xiao Zhou, Jie Zhou

    Abstract: Recent advancements in large vision-language models (LVLMs), such as GPT4-V and LLaVA, have been substantial. LLaVA's modular architecture, in particular, offers a blend of simplicity and efficiency. Recent works mainly focus on introducing more pre-training and instruction tuning data to improve model's performance. This paper delves into the often-neglected aspects of data efficiency during pre-… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  46. arXiv:2405.11315  [pdf, other

    cs.CV

    MediCLIP: Adapting CLIP for Few-shot Medical Image Anomaly Detection

    Authors: Ximiao Zhang, Min Xu, Dehui Qiu, Ruixin Yan, Ning Lang, Xiuzhuang Zhou

    Abstract: In the field of medical decision-making, precise anomaly detection in medical imaging plays a pivotal role in aiding clinicians. However, previous work is reliant on large-scale datasets for training anomaly detection models, which increases the development cost. This paper first focuses on the task of medical image anomaly detection in the few-shot setting, which is critically significant for the… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

    Comments: 12 pages, 3 figures, 5 tables, early accepted at MICCAI 2024

  47. arXiv:2405.10202  [pdf, other

    cs.CL

    Hierarchical Attention Graph for Scientific Document Summarization in Global and Local Level

    Authors: Chenlong Zhao, Xiwen Zhou, Xiaopeng Xie, Yong Zhang

    Abstract: Scientific document summarization has been a challenging task due to the long structure of the input text. The long input hinders the simultaneous effective modeling of both global high-order relations between sentences and local intra-sentence relations which is the most critical step in extractive summarization. However, existing methods mostly focus on one type of relation, neglecting the simul… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: Accepted to NAACL 2024 Findings

  48. arXiv:2405.09848  [pdf, other

    cs.CL cs.AI

    Enhancing Semantics in Multimodal Chain of Thought via Soft Negative Sampling

    Authors: Guangmin Zheng, ** Wang, Xiaobing Zhou, Xuejie Zhang

    Abstract: Chain of thought (CoT) has proven useful for problems requiring complex reasoning. Many of these problems are both textual and multimodal. Given the inputs in different modalities, a model generates a rationale and then uses it to answer a question. Because of the hallucination issue, the generated soft negative rationales with high textual quality but illogical semantics do not always help improv… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: Accepted by LREC-COLING 2024

  49. arXiv:2405.09373  [pdf, other

    cs.CL

    PolygloToxicityPrompts: Multilingual Evaluation of Neural Toxic Degeneration in Large Language Models

    Authors: Devansh Jain, Priyanshu Kumar, Samuel Gehman, Xuhui Zhou, Thomas Hartvigsen, Maarten Sap

    Abstract: Recent advances in large language models (LLMs) have led to their extensive global deployment, and ensuring their safety calls for comprehensive and multilingual toxicity evaluations. However, existing toxicity benchmarks are overwhelmingly focused on English, posing serious risks to deploying LLMs in other languages. We address this by introducing PolygloToxicityPrompts (PTP), the first large-sca… ▽ More

    Submitted 20 May, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

  50. arXiv:2405.09034  [pdf, ps, other

    quant-ph cs.NI

    Entanglement Distribution Delay Optimization in Quantum Networks with Distillation

    Authors: Mahdi Chehimi, Kenneth Goodenough, Walid Saad, Don Towsley, Tony X. Zhou

    Abstract: Quantum networks (QNs) distribute entangled states to enable distributed quantum computing and sensing applications. However, in such QNs, quantum switches (QSs) have limited resources that are highly sensitive to noise and losses and must be carefully allocated to minimize entanglement distribution delay. In this paper, a QS resource allocation framework is proposed, which jointly optimizes the a… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: 13 pages, 6 figures