Skip to main content

Showing 1–50 of 216 results for author: Yan, Q

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.16531  [pdf, other

    cs.CV

    GIM: A Million-scale Benchmark for Generative Image Manipulation Detection and Localization

    Authors: Yirui Chen, Xudong Huang, Quan Zhang, Wei Li, Mingjian Zhu, Qiangyu Yan, Simiao Li, Hanting Chen, Hailin Hu, Jie Yang, Wei Liu, Jie Hu

    Abstract: The extraordinary ability of generative models emerges as a new trend in image editing and generating realistic images, posing a serious threat to the trustworthiness of multimedia data and driving the research of image manipulation detection and location(IMDL). However, the lack of a large-scale data foundation makes IMDL task unattainable. In this paper, a local manipulation pipeline is designed… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Code page: https://github.com/chenyirui/GIM

  2. arXiv:2406.06028  [pdf, other

    cs.CV

    ReCon1M:A Large-scale Benchmark Dataset for Relation Comprehension in Remote Sensing Imagery

    Authors: Xian Sun, Qiwei Yan, Chubo Deng, Chenglong Liu, Yi Jiang, Zhongyan Hou, Wanxuan Lu, Fanglong Yao, Xiaoyu Liu, Lingxiang Hao, Hongfeng Yu

    Abstract: Scene Graph Generation (SGG) is a high-level visual understanding and reasoning task aimed at extracting entities (such as objects) and their interrelationships from images. Significant progress has been made in the study of SGG in natural images in recent years, but its exploration in the domain of remote sensing images remains very limited. The complex characteristics of remote sensing images ne… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  3. arXiv:2406.02463  [pdf, other

    cs.CR

    Click Without Compromise: Online Advertising Measurement via Per User Differential Privacy

    Authors: Yingtai Xiao, Jian Du, Shikun Zhang, Qiang Yan, Danfeng Zhang, Daniel Kifer

    Abstract: Online advertising is a cornerstone of the Internet ecosystem, with advertising measurement playing a crucial role in optimizing efficiency. Ad measurement entails attributing desired behaviors, such as purchases, to ad exposures across various platforms, necessitating the collection of user activities across these platforms. As this practice faces increasing restrictions due to rising privacy con… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  4. arXiv:2405.20421  [pdf, other

    cs.AI

    Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA

    Authors: Qianqi Yan, Xuehai He, Xiang Yue, Xin Eric Wang

    Abstract: Large Multimodal Models (LMMs) have shown remarkable progress in medical Visual Question Answering (Med-VQA), achieving high accuracy on existing benchmarks. However, their reliability under robust evaluation is questionable. This study reveals that when subjected to simple probing evaluation, state-of-the-art models perform worse than random guessing on medical diagnosis questions. To address thi… ▽ More

    Submitted 21 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

  5. arXiv:2405.20330  [pdf, other

    cs.CV cs.AI cs.GR

    4DHands: Reconstructing Interactive Hands in 4D with Transformers

    Authors: Dixuan Lin, Yuxiang Zhang, Mengcheng Li, Yebin Liu, Wei **g, Qi Yan, Qianying Wang, Hongwen Zhang

    Abstract: In this paper, we introduce 4DHands, a robust approach to recovering interactive hand meshes and their relative movement from monocular inputs. Our approach addresses two major limitations of previous methods: lacking a unified solution for handling various hand image inputs and neglecting the positional relationship of two hands within images. To overcome these challenges, we develop a transforme… ▽ More

    Submitted 31 May, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: More demo videos can be seen at our project page: https://4dhands.github.io

  6. arXiv:2405.08780  [pdf

    cs.CV cs.AI

    Harnessing the power of longitudinal medical imaging for eye disease prognosis using Transformer-based sequence modeling

    Authors: Gregory Holste, Mingquan Lin, Ruiwen Zhou, Fei Wang, Lei Liu, Qi Yan, Sarah H. Van Tassel, Kyle Kovacs, Emily Y. Chew, Zhiyong Lu, Zhangyang Wang, Yifan Peng

    Abstract: Deep learning has enabled breakthroughs in automated diagnosis from medical imaging, with many successful applications in ophthalmology. However, standard medical image classification approaches only assess disease presence at the time of acquisition, neglecting the common clinical setting of longitudinal imaging. For slow, progressive eye diseases like age-related macular degeneration (AMD) and p… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  7. arXiv:2405.03150  [pdf, other

    cs.CV cs.LG

    Video Diffusion Models: A Survey

    Authors: Andrew Melnik, Michal Ljubljanac, Cong Lu, Qi Yan, Weiming Ren, Helge Ritter

    Abstract: Diffusion generative models have recently become a robust technique for producing and modifying coherent, high-quality video. This survey offers a systematic overview of critical elements of diffusion models for video generation, covering applications, architectural choices, and the modeling of temporal dynamics. Recent advancements in the field are summarized and grouped into development trends.… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  8. arXiv:2405.02791  [pdf, other

    cs.CV cs.AI

    Efficient Text-driven Motion Generation via Latent Consistency Training

    Authors: Mengxian Hu, Minghao Zhu, Xun Zhou, Qingqing Yan, Shu Li, Chengju Liu, Qijun Chen

    Abstract: Motion diffusion models excel at text-driven motion generation but struggle with real-time inference since motion sequences are time-axis redundant and solving reverse diffusion trajectory involves tens or hundreds of sequential iterations. In this paper, we propose a Motion Latent Consistency Training (MLCT) framework, which allows for large-scale skip sampling of compact motion latent representa… ▽ More

    Submitted 25 May, 2024; v1 submitted 4 May, 2024; originally announced May 2024.

  9. arXiv:2404.16687  [pdf, other

    cs.CV

    NTIRE 2024 Quality Assessment of AI-Generated Content Challenge

    Authors: Xiaohong Liu, Xiongkuo Min, Guangtao Zhai, Chunyi Li, Tengchuan Kou, Wei Sun, Haoning Wu, Yixuan Gao, Yuqin Cao, Zicheng Zhang, Xiele Wu, Radu Timofte, Fei Peng, Huiyuan Fu, Anlong Ming, Chuanming Wang, Huadong Ma, Shuai He, Zifei Dou, Shu Chen, Huacong Zhang, Haiyi Xie, Chengwei Wang, Baoying Chen, Jishen Zeng , et al. (89 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Conte… ▽ More

    Submitted 7 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

  10. arXiv:2404.14444  [pdf, other

    cs.LG cs.AI cs.ET

    Practical Battery Health Monitoring using Uncertainty-Aware Bayesian Neural Network

    Authors: Yunyi Zhao, Zhang Wei, Qingyu Yan, Man-Fai Ng, B. Sivaneasan, Cheng Xiang

    Abstract: Battery health monitoring and prediction are critically important in the era of electric mobility with a huge impact on safety, sustainability, and economic aspects. Existing research often focuses on prediction accuracy but tends to neglect practical factors that may hinder the technology's deployment in real-world applications. In this paper, we address these practical considerations and develop… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

    Comments: 6 pages

  11. arXiv:2404.14248  [pdf, other

    cs.CV

    NTIRE 2024 Challenge on Low Light Image Enhancement: Methods and Results

    Authors: Xiaoning Liu, Zongwei Wu, Ao Li, Florin-Alexandru Vasluianu, Yulun Zhang, Shuhang Gu, Le Zhang, Ce Zhu, Radu Timofte, Zhi **, Hongjun Wu, Chenxi Wang, Haitao Ling, Yuanhao Cai, Hao Bian, Yuxin Zheng, **g Lin, Alan Yuille, Ben Shao, ** Guo, Tianli Liu, Mohao Wu, Yixu Feng, Shuo Hou, Haotian Lin , et al. (87 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 low light image enhancement challenge, highlighting the proposed solutions and results. The aim of this challenge is to discover an effective network design or solution capable of generating brighter, clearer, and visually appealing results when dealing with a variety of conditions, including ultra-high resolution (4K and beyond), non-uniform illumination, backlig… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: NTIRE 2024 Challenge Report

  12. arXiv:2404.14132  [pdf, other

    cs.CV eess.IV

    CRNet: A Detail-Preserving Network for Unified Image Restoration and Enhancement Task

    Authors: Kangzhen Yang, Tao Hu, Kexin Dai, Genggeng Chen, Yu Cao, Wei Dong, Peng Wu, Yanning Zhang, Qingsen Yan

    Abstract: In real-world scenarios, images captured often suffer from blurring, noise, and other forms of image degradation, and due to sensor limitations, people usually can only obtain low dynamic range images. To achieve high-quality images, researchers have attempted various image restoration and enhancement operations on photographs, including denoising, deblurring, and high dynamic range imaging. Howev… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: This paper is accepted by CVPR2024 Workshop, Code: https://github.com/CalvinYang0/CRNet

  13. arXiv:2404.13537  [pdf, other

    eess.IV cs.CV

    Bracketing Image Restoration and Enhancement with High-Low Frequency Decomposition

    Authors: Genggeng Chen, Kexin Dai, Kangzhen Yang, Tao Hu, Xiangyu Chen, Yongqing Yang, Wei Dong, Peng Wu, Yanning Zhang, Qingsen Yan

    Abstract: In real-world scenarios, due to a series of image degradations, obtaining high-quality, clear content photos is challenging. While significant progress has been made in synthesizing high-quality images, previous methods for image restoration and enhancement often overlooked the characteristics of different degradations. They applied the same structure to address various types of degradation, resul… ▽ More

    Submitted 24 April, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

    Comments: This paper is accepted by CVPR 2024 Workshop, code: https://github.com/chengeng0613/HLNet

  14. arXiv:2404.05802  [pdf, other

    cs.CE cs.CV cs.MM

    BatSort: Enhanced Battery Classification with Transfer Learning for Battery Sorting and Recycling

    Authors: Yunyi Zhao, Wei Zhang, Erhai Hu, Qingyu Yan, Cheng Xiang, King Jet Tseng, Dusit Niyato

    Abstract: Battery recycling is a critical process for minimizing environmental harm and resource waste for used batteries. However, it is challenging, largely because sorting batteries is costly and hardly automated to group batteries based on battery types. In this paper, we introduce a machine learning-based approach for battery-type classification and address the daunting problem of data scarcity for the… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  15. arXiv:2404.05674  [pdf, other

    cs.CV

    MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation

    Authors: Kunpeng Song, Yizhe Zhu, Bingchen Liu, Qing Yan, Ahmed Elgammal, Xiao Yang

    Abstract: In this paper, we present MoMA: an open-vocabulary, training-free personalized image model that boasts flexible zero-shot capabilities. As foundational text-to-image models rapidly evolve, the demand for robust image-to-image translation grows. Addressing this need, MoMA specializes in subject-driven personalized image generation. Utilizing an open-source, Multimodal Large Language Model (MLLM), w… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  16. arXiv:2404.00849  [pdf, other

    cs.CV

    Generating Content for HDR Deghosting from Frequency View

    Authors: Tao Hu, Qingsen Yan, Yuankai Qi, Yanning Zhang

    Abstract: Recovering ghost-free High Dynamic Range (HDR) images from multiple Low Dynamic Range (LDR) images becomes challenging when the LDR images exhibit saturation and significant motion. Recent Diffusion Models (DMs) have been introduced in HDR imaging field, demonstrating promising performance, particularly in achieving visually perceptible results compared to previous DNN-based methods. However, DMs… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: This paper is accepted by CVPR2024

  17. arXiv:2403.19067  [pdf, other

    cs.CV

    Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach

    Authors: Wei Dong, Xing Zhang, Bihui Chen, Dawei Yan, Zhijun Lin, Qingsen Yan, Peng Wang, Yang Yang

    Abstract: Parameter-efficient fine-tuning for pre-trained Vision Transformers aims to adeptly tailor a model to downstream tasks by learning a minimal set of new adaptation parameters while preserving the frozen majority of pre-trained parameters. Striking a balance between retaining the generalizable representation capacity of the pre-trained model and acquiring task-specific features poses a key challenge… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  18. arXiv:2403.11445  [pdf, other

    cs.CR cs.DS eess.SP

    Budget Recycling Differential Privacy

    Authors: Bo Jiang, Jian Du, Sagar Shamar, Qiang Yan

    Abstract: Differential Privacy (DP) mechanisms usually {force} reduction in data utility by producing "out-of-bound" noisy results for a tight privacy budget. We introduce the Budget Recycling Differential Privacy (BR-DP) framework, designed to provide soft-bounded noisy outputs for a broad range of existing DP mechanisms. By "soft-bounded," we refer to the mechanism's ability to release most outputs within… ▽ More

    Submitted 16 April, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

  19. arXiv:2403.05794  [pdf, other

    cs.CR cs.AI

    Privacy-Preserving Diffusion Model Using Homomorphic Encryption

    Authors: Yaojian Chen, Qiben Yan

    Abstract: In this paper, we introduce a privacy-preserving stable diffusion framework leveraging homomorphic encryption, called HE-Diffusion, which primarily focuses on protecting the denoising phase of the diffusion process. HE-Diffusion is a tailored encryption framework specifically designed to align with the unique architecture of stable diffusion, ensuring both privacy and functionality. To address the… ▽ More

    Submitted 1 May, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  20. arXiv:2402.18771  [pdf, other

    cs.CV cs.RO

    NARUTO: Neural Active Reconstruction from Uncertain Target Observations

    Authors: Ziyue Feng, Huangying Zhan, Zheng Chen, Qingan Yan, Xiangyu Xu, Changjiang Cai, Bing Li, Qilun Zhu, Yi Xu

    Abstract: We present NARUTO, a neural active reconstruction system that combines a hybrid neural representation with uncertainty learning, enabling high-fidelity surface reconstruction. Our approach leverages a multi-resolution hash-grid as the map** backbone, chosen for its exceptional convergence speed and capacity to capture high-frequency local features.The centerpiece of our work is the incorporation… ▽ More

    Submitted 16 April, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: Accepted to CVPR2024. Project page: https://oppo-us-research.github.io/NARUTO-website/. Code: https://github.com/oppo-us-research/NARUTO

  21. arXiv:2402.16641  [pdf, other

    cs.CV

    Towards Open-ended Visual Quality Comparison

    Authors: Haoning Wu, Hanwei Zhu, Zicheng Zhang, Erli Zhang, Chaofeng Chen, Liang Liao, Chunyi Li, Annan Wang, Wenxiu Sun, Qiong Yan, Xiaohong Liu, Guangtao Zhai, Shiqi Wang, Weisi Lin

    Abstract: Comparative settings (e.g. pairwise choice, listwise ranking) have been adopted by a wide range of subjective studies for image quality assessment (IQA), as it inherently standardizes the evaluation criteria across different observers and offer more clear-cut responses. In this work, we extend the edge of emerging large multi-modality models (LMMs) to further advance visual quality comparison into… ▽ More

    Submitted 4 March, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: Fix typos

  22. arXiv:2402.13607  [pdf, other

    cs.CV cs.CL

    CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models

    Authors: Fuwen Luo, Chi Chen, Zihao Wan, Zhaolu Kang, Qidong Yan, Yingjie Li, Xiaolong Wang, Siyu Wang, Ziyue Wang, Xiaoyue Mi, Peng Li, Ning Ma, Maosong Sun, Yang Liu

    Abstract: Multimodal large language models (MLLMs) have demonstrated promising results in a variety of tasks that combine vision and language. As these models become more integral to research and applications, conducting comprehensive evaluations of their capabilities has grown increasingly important. However, most existing benchmarks fail to consider that, in certain situations, images need to be interpret… ▽ More

    Submitted 4 June, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

  23. arXiv:2402.05809  [pdf, other

    cs.CV cs.AI

    You Only Need One Color Space: An Efficient Network for Low-light Image Enhancement

    Authors: Qingsen Yan, Yixu Feng, Cheng Zhang, Pei Wang, Peng Wu, Wei Dong, **qiu Sun, Yanning Zhang

    Abstract: Low-Light Image Enhancement (LLIE) task tends to restore the details and visual information from corrupted low-light images. Most existing methods learn the map** function between low/normal-light images by Deep Neural Networks (DNNs) on sRGB and HSV color space. Nevertheless, enhancement involves amplifying image signals, and applying these color spaces to low-light images with a low signal-to-… ▽ More

    Submitted 17 June, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

    Comments: Qingsen Yan, Yixu Feng, Cheng Zhang contributed equally to this work. Corresponding author: Yanning Zhang

  24. arXiv:2402.00450  [pdf, other

    cs.LG

    CPT: Competence-progressive Training Strategy for Few-shot Node Classification

    Authors: Qilong Yan, Yufeng Zhang, **ghao Zhang, **gpu Duan, Jian Yin

    Abstract: Graph Neural Networks (GNNs) have made significant advancements in node classification, but their success relies on sufficient labeled nodes per class in the training data. Real-world graph data often exhibits a long-tail distribution with sparse labels, emphasizing the importance of GNNs' ability in few-shot node classification, which entails categorizing nodes with limited data. Traditional epis… ▽ More

    Submitted 23 February, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2206.11972 by other authors

  25. arXiv:2401.15847  [pdf, other

    cs.CV cs.AI cs.CL

    Muffin or Chihuahua? Challenging Multimodal Large Language Models with Multipanel VQA

    Authors: Yue Fan, **g Gu, Kaiwen Zhou, Qianqi Yan, Shan Jiang, Ching-Chen Kuo, Xinze Guan, Xin Eric Wang

    Abstract: Multipanel images, commonly seen as web screenshots, posters, etc., pervade our daily lives. These images, characterized by their composition of multiple subfigures in distinct layouts, effectively convey information to people. Toward building advanced multimodal AI applications, such as agents that understand complex scenes and navigate through webpages, the skill of multipanel visual reasoning i… ▽ More

    Submitted 27 June, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

    Comments: ACL 2024

  26. arXiv:2401.01130  [pdf, other

    cs.CV

    Joint Generative Modeling of Scene Graphs and Images via Diffusion Models

    Authors: Bicheng Xu, Qi Yan, Renjie Liao, Lele Wang, Leonid Sigal

    Abstract: In this paper, we present a novel generative task: joint scene graph - image generation. While previous works have explored image generation conditioned on scene graphs or layouts, our task is distinctive and important as it involves generating scene graphs themselves unconditionally from noise, enabling efficient and interpretable control for image generation. Our task is challenging, requiring t… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

  27. arXiv:2401.00871  [pdf, other

    cs.CV

    PlanarNeRF: Online Learning of Planar Primitives with Neural Radiance Fields

    Authors: Zheng Chen, Qingan Yan, Huangying Zhan, Changjiang Cai, Xiangyu Xu, Yuzhong Huang, Weihan Wang, Ziyue Feng, Lantao Liu, Yi Xu

    Abstract: Identifying spatially complete planar primitives from visual data is a crucial task in computer vision. Prior methods are largely restricted to either 2D segment recovery or simplifying 3D structures, even with extensive plane annotations. We present PlanarNeRF, a novel framework capable of detecting dense 3D planes through online learning. Drawing upon the neural field representation, PlanarNeRF… ▽ More

    Submitted 29 December, 2023; originally announced January 2024.

  28. arXiv:2312.17090  [pdf, other

    cs.CV cs.CL cs.LG

    Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels

    Authors: Haoning Wu, Zicheng Zhang, Weixia Zhang, Chaofeng Chen, Liang Liao, Chunyi Li, Yixuan Gao, Annan Wang, Erli Zhang, Wenxiu Sun, Qiong Yan, Xiongkuo Min, Guangtao Zhai, Weisi Lin

    Abstract: The explosion of visual content available online underscores the requirement for an accurate machine assessor to robustly evaluate scores across diverse types of visual contents. While recent studies have demonstrated the exceptional potentials of large multi-modality models (LMMs) on a wide range of related fields, in this work, we explore how to teach them for visual rating aligned with human op… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

    Comments: Technical Report

  29. arXiv:2312.08760  [pdf, other

    cs.CV

    CF-NeRF: Camera Parameter Free Neural Radiance Fields with Incremental Learning

    Authors: Qingsong Yan, Qiang Wang, Kaiyong Zhao, Jie Chen, Bo Li, Xiaowen Chu, Fei Deng

    Abstract: Neural Radiance Fields (NeRF) have demonstrated impressive performance in novel view synthesis. However, NeRF and most of its variants still rely on traditional complex pipelines to provide extrinsic and intrinsic camera parameters, such as COLMAP. Recent works, like NeRFmm, BARF, and L2G-NeRF, directly treat camera parameters as learnable and estimate them through differential volume rendering. H… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: Accepted at the Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI24)

  30. arXiv:2312.07942  [pdf, other

    cs.SI

    Learning Diffusions under Uncertainty

    Authors: Hao Huang, Qian Yan, Keqi Han, Ting Gan, Jiawei Jiang, Quanqing Xu, Chuanhui Yan

    Abstract: To infer a diffusion network based on observations from historical diffusion processes, existing approaches assume that observation data contain exact occurrence time of each node infection, or at least the eventual infection statuses of nodes in each diffusion process. They determine potential influence relationships between nodes by identifying frequent sequences, or statistical correlations, am… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

  31. arXiv:2312.06162  [pdf, other

    cs.CV

    Textual Prompt Guided Image Restoration

    Authors: Qiuhai Yan, Aiwen Jiang, Kang Chen, Long Peng, Qiaosi Yi, Chunjie Zhang

    Abstract: Image restoration has always been a cutting-edge topic in the academic and industrial fields of computer vision. Since degradation signals are often random and diverse, "all-in-one" models that can do blind image restoration have been concerned in recent years. Early works require training specialized headers and tails to handle each degradation of concern, which are manually cumbersome. Recent wo… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

    Comments: 12 pages, 10figures

  32. arXiv:2312.06010  [pdf, other

    cs.CR cs.SD eess.AS

    A Practical Survey on Emerging Threats from AI-driven Voice Attacks: How Vulnerable are Commercial Voice Control Systems?

    Authors: Yuanda Wang, Qiben Yan, Nikolay Ivanov, Xun Chen

    Abstract: The emergence of Artificial Intelligence (AI)-driven audio attacks has revealed new security vulnerabilities in voice control systems. While researchers have introduced a multitude of attack strategies targeting voice control systems (VCS), the continual advancements of VCS have diminished the impact of many such attacks. Recognizing this dynamic landscape, our study endeavors to comprehensively a… ▽ More

    Submitted 4 January, 2024; v1 submitted 10 December, 2023; originally announced December 2023.

    Comments: 14 pages

  33. arXiv:2312.05616  [pdf, other

    cs.CV

    Iterative Token Evaluation and Refinement for Real-World Super-Resolution

    Authors: Chaofeng Chen, Shangchen Zhou, Liang Liao, Haoning Wu, Wenxiu Sun, Qiong Yan, Weisi Lin

    Abstract: Real-world image super-resolution (RWSR) is a long-standing problem as low-quality (LQ) images often have complex and unidentified degradations. Existing methods such as Generative Adversarial Networks (GANs) or continuous diffusion models present their own issues including GANs being difficult to train while continuous diffusion models requiring numerous inference steps. In this paper, we propose… ▽ More

    Submitted 9 December, 2023; originally announced December 2023.

    Comments: To appear in AAAI2024, https://github.com/chaofengc/ITER

  34. arXiv:2311.18118  [pdf, other

    cs.CR cs.IR

    AnonPSI: An Anonymity Assessment Framework for PSI

    Authors: Bo Jiang, Jian Du, Qiang Yan

    Abstract: Private Set Intersection (PSI) is a widely used protocol that enables two parties to securely compute a function over the intersected part of their shared datasets and has been a significant research focus over the years. However, recent studies have highlighted its vulnerability to Set Membership Inference Attacks (SMIA), where an adversary might deduce an individual's membership by invoking mult… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

  35. arXiv:2311.15657  [pdf, other

    cs.CV

    Enhancing Diffusion Models with Text-Encoder Reinforcement Learning

    Authors: Chaofeng Chen, Annan Wang, Haoning Wu, Liang Liao, Wenxiu Sun, Qiong Yan, Weisi Lin

    Abstract: Text-to-image diffusion models are typically trained to optimize the log-likelihood objective, which presents challenges in meeting specific requirements for downstream tasks, such as image aesthetics and image-text alignment. Recent research addresses this issue by refining the diffusion U-Net using human rewards through reinforcement learning or direct backpropagation. However, many of them over… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

  36. arXiv:2311.13233  [pdf, other

    cs.CR cs.AI

    A Survey of Adversarial CAPTCHAs on its History, Classification and Generation

    Authors: Zisheng Xu, Qiao Yan, F. Richard Yu, Victor C. M. Leung

    Abstract: Completely Automated Public Turing test to tell Computers and Humans Apart, short for CAPTCHA, is an essential and relatively easy way to defend against malicious attacks implemented by bots. The security and usability trade-off limits the use of massive geometric transformations to interfere deep model recognition and deep models even outperformed humans in complex CAPTCHAs. The discovery of adve… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

    Comments: Submitted to ACM Computing Surveys (Under Review)

  37. arXiv:2311.12052  [pdf, other

    cs.CV

    MagicPose: Realistic Human Poses and Facial Expressions Retargeting with Identity-aware Diffusion

    Authors: Di Chang, Yichun Shi, Quankai Gao, Jessica Fu, Hongyi Xu, Guoxian Song, Qing Yan, Yizhe Zhu, Xiao Yang, Mohammad Soleymani

    Abstract: In this work, we propose MagicPose, a diffusion-based model for 2D human pose and facial expression retargeting. Specifically, given a reference image, we aim to generate a person's new images by controlling the poses and facial expressions while kee** the identity unchanged. To this end, we propose a two-stage training strategy to disentangle human motions and appearance (e.g., facial expressio… ▽ More

    Submitted 5 May, 2024; v1 submitted 18 November, 2023; originally announced November 2023.

    Comments: Accepted by ICML 2024. MagicPose and MagicDance are the same project. Website:https://boese0601.github.io/magicdance/ Code:https://github.com/Boese0601/MagicDance

  38. arXiv:2311.11796  [pdf, other

    cs.CR cs.AI cs.CL cs.CV

    Beyond Boundaries: A Comprehensive Survey of Transferable Attacks on AI Systems

    Authors: Guang**g Wang, Ce Zhou, Yuanda Wang, Bocheng Chen, Hanqing Guo, Qiben Yan

    Abstract: Artificial Intelligence (AI) systems such as autonomous vehicles, facial recognition, and speech recognition systems are increasingly integrated into our daily lives. However, despite their utility, these AI systems are vulnerable to a wide range of attacks such as adversarial, backdoor, data poisoning, membership inference, model inversion, and model stealing attacks. In particular, numerous atta… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

  39. arXiv:2311.06783  [pdf, other

    cs.CV cs.MM

    Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models

    Authors: Haoning Wu, Zicheng Zhang, Erli Zhang, Chaofeng Chen, Liang Liao, Annan Wang, Kaixin Xu, Chunyi Li, **gwen Hou, Guangtao Zhai, Geng Xue, Wenxiu Sun, Qiong Yan, Weisi Lin

    Abstract: Multi-modality foundation models, as represented by GPT-4V, have brought a new paradigm for low-level visual perception and understanding tasks, that can respond to a broad range of natural human instructions in a model. While existing foundation models have shown exciting potentials on low-level visual tasks, their related abilities are still preliminary and need to be improved. In order to enhan… ▽ More

    Submitted 12 November, 2023; originally announced November 2023.

    Comments: 16 pages, 11 figures, page 12-16 as appendix

  40. arXiv:2311.02572  [pdf, other

    cs.CV

    Multiple Object Tracking based on Occlusion-Aware Embedding Consistency Learning

    Authors: Yaoqi Hu, Axi Niu, Yu Zhu, Qingsen Yan, **qiu Sun, Yanning Zhang

    Abstract: The Joint Detection and Embedding (JDE) framework has achieved remarkable progress for multiple object tracking. Existing methods often employ extracted embeddings to re-establish associations between new detections and previously disrupted tracks. However, the reliability of embeddings diminishes when the region of the occluded object frequently contains adjacent objects or clutters, especially i… ▽ More

    Submitted 5 November, 2023; originally announced November 2023.

  41. arXiv:2311.00932  [pdf, other

    cs.CV eess.IV

    Towards High-quality HDR Deghosting with Conditional Diffusion Models

    Authors: Qingsen Yan, Tao Hu, Yuan Sun, Hao Tang, Yu Zhu, Wei Dong, Luc Van Gool, Yanning Zhang

    Abstract: High Dynamic Range (HDR) images can be recovered from several Low Dynamic Range (LDR) images by existing Deep Neural Networks (DNNs) techniques. Despite the remarkable progress, DNN-based methods still generate ghosting artifacts when LDR images have saturation and large motion, which hinders potential applications in real-world scenarios. To address this challenge, we formulate the HDR deghosting… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

    Comments: accepted by IEEE TCSVT

  42. arXiv:2310.16389  [pdf, other

    cs.CV

    MVFAN: Multi-View Feature Assisted Network for 4D Radar Object Detection

    Authors: Qiao Yan, Yihan Wang

    Abstract: 4D radar is recognized for its resilience and cost-effectiveness under adverse weather conditions, thus playing a pivotal role in autonomous driving. While cameras and LiDAR are typically the primary sensors used in perception modules for autonomous vehicles, radar serves as a valuable supplementary sensor. Unlike LiDAR and cameras, radar remains unimpaired by harsh weather conditions, thereby off… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

    Comments: 19 Pages, 7 figures, Accepted by ICONIP 2023

  43. arXiv:2310.02417  [pdf, other

    cs.CR

    Jailbreaker in Jail: Moving Target Defense for Large Language Models

    Authors: Bocheng Chen, Advait Paliwal, Qiben Yan

    Abstract: Large language models (LLMs), known for their capability in understanding and following instructions, are vulnerable to adversarial attacks. Researchers have found that current commercial LLMs either fail to be "harmless" by presenting unethical answers, or fail to be "helpful" by refusing to offer meaningful answers when faced with adversarial queries. To strike a balance between being helpful an… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

    Comments: MTD Workshop in CCS'23

  44. arXiv:2310.01880  [pdf, other

    cs.LG

    AutoCast++: Enhancing World Event Prediction with Zero-shot Ranking-based Context Retrieval

    Authors: Qi Yan, Raihan Seraj, Jiawei He, Lili Meng, Tristan Sylvain

    Abstract: Machine-based prediction of real-world events is garnering attention due to its potential for informed decision-making. Whereas traditional forecasting predominantly hinges on structured data like time-series, recent breakthroughs in language models enable predictions using unstructured text. In particular, (Zou et al., 2022) unveils AutoCast, a new benchmark that employs news articles for answeri… ▽ More

    Submitted 18 April, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: ICLR 2024

  45. arXiv:2309.14181  [pdf, other

    cs.CV cs.AI cs.MM

    Q-Bench: A Benchmark for General-Purpose Foundation Models on Low-level Vision

    Authors: Haoning Wu, Zicheng Zhang, Erli Zhang, Chaofeng Chen, Liang Liao, Annan Wang, Chunyi Li, Wenxiu Sun, Qiong Yan, Guangtao Zhai, Weisi Lin

    Abstract: The rapid evolution of Multi-modality Large Language Models (MLLMs) has catalyzed a shift in computer vision from specialized models to general-purpose foundation models. Nevertheless, there is still an inadequacy in assessing the abilities of MLLMs on low-level visual perception and understanding. To address this gap, we present Q-Bench, a holistic benchmark crafted to systematically evaluate pot… ▽ More

    Submitted 1 January, 2024; v1 submitted 25 September, 2023; originally announced September 2023.

    Comments: 27 pages, 11 tables, with updated results

  46. arXiv:2309.06981  [pdf, other

    cs.CR cs.AI cs.LG cs.SD eess.AS

    MASTERKEY: Practical Backdoor Attack Against Speaker Verification Systems

    Authors: Hanqing Guo, Xun Chen, Junfeng Guo, Li Xiao, Qiben Yan

    Abstract: Speaker Verification (SV) is widely deployed in mobile systems to authenticate legitimate users by using their voice traits. In this work, we propose a backdoor attack MASTERKEY, to compromise the SV models. Different from previous attacks, we focus on a real-world practical setting where the attacker possesses no knowledge of the intended victim. To design MASTERKEY, we investigate the limitation… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

    Comments: Accepted by Mobicom 2023

  47. arXiv:2309.06960  [pdf, other

    cs.CR cs.AI cs.HC

    PhantomSound: Black-Box, Query-Efficient Audio Adversarial Attack via Split-Second Phoneme Injection

    Authors: Hanqing Guo, Guang**g Wang, Yuanda Wang, Bocheng Chen, Qiben Yan, Li Xiao

    Abstract: In this paper, we propose PhantomSound, a query-efficient black-box attack toward voice assistants. Existing black-box adversarial attacks on voice assistants either apply substitution models or leverage the intermediate model output to estimate the gradients for crafting adversarial audio samples. However, these attack approaches require a significant amount of queries with a lengthy training sta… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

    Comments: RAID 2023

  48. arXiv:2308.12001  [pdf, other

    cs.CV

    Local Distortion Aware Efficient Transformer Adaptation for Image Quality Assessment

    Authors: Kangmin Xu, Liang Liao, **g Xiao, Chaofeng Chen, Haoning Wu, Qiong Yan, Weisi Lin

    Abstract: Image Quality Assessment (IQA) constitutes a fundamental task within the field of computer vision, yet it remains an unresolved challenge, owing to the intricate distortion conditions, diverse image contents, and limited availability of data. Recently, the community has witnessed the emergence of numerous large-scale pretrained foundation models, which greatly benefit from dramatically increased d… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

  49. arXiv:2308.11681  [pdf, other

    cs.CV cs.MM

    VadCLIP: Adapting Vision-Language Models for Weakly Supervised Video Anomaly Detection

    Authors: Peng Wu, Xuerong Zhou, Guansong Pang, Lingru Zhou, Qingsen Yan, Peng Wang, Yanning Zhang

    Abstract: The recent contrastive language-image pre-training (CLIP) model has shown great success in a wide range of image-level tasks, revealing remarkable ability for learning powerful visual representations with rich semantics. An open and worthwhile problem is efficiently adapting such a strong model to the video domain and designing a robust video anomaly detector. In this work, we propose VadCLIP, a n… ▽ More

    Submitted 15 December, 2023; v1 submitted 22 August, 2023; originally announced August 2023.

    Comments: Accept to AAAI2024

  50. arXiv:2308.10161   

    cs.CV

    ThermRad: A Multi-modal Dataset for Robust 3D Object Detection under Challenging Conditions

    Authors: Qiao Yan, Yihan Wang

    Abstract: Robust 3D object detection in extreme weather and illumination conditions is a challenging task. While radars and thermal cameras are known for their resilience to these conditions, few studies have been conducted on radar-thermal fusion due to the lack of corresponding datasets. To address this gap, we first present a new multi-modal dataset called ThermRad, which includes a 3D LiDAR, a 4D radar,… ▽ More

    Submitted 12 September, 2023; v1 submitted 20 August, 2023; originally announced August 2023.

    Comments: At this time, we have not reached a definitive agreement regarding the ownership and copyright of this dataset. Due to the unresolved issue regarding the dataset, I am writing to formally request the withdrawal of our paper