Skip to main content

Showing 1–50 of 147 results for author: Qi, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01930  [pdf, other

    cs.CV

    Self-Cooperation Knowledge Distillation for Novel Class Discovery

    Authors: Yuzheng Wang, Zhaoyu Chen, Dingkang Yang, Yunquan Sun, Lizhe Qi

    Abstract: Novel Class Discovery (NCD) aims to discover unknown and novel classes in an unlabeled set by leveraging knowledge already learned about known classes. Existing works focus on instance-level or class-level knowledge representation and build a shared representation space to achieve performance improvements. However, a long-neglected issue is the potential imbalanced number of samples from known and… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  2. arXiv:2406.19369  [pdf, other

    cs.CV

    Mamba or RWKV: Exploring High-Quality and High-Efficiency Segment Anything Model

    Authors: Haobo Yuan, Xiangtai Li, Lu Qi, Tao Zhang, Ming-Hsuan Yang, Shuicheng Yan, Chen Change Loy

    Abstract: Transformer-based segmentation methods face the challenge of efficient inference when dealing with high-resolution images. Recently, several linear attention architectures, such as Mamba and RWKV, have attracted much attention as they can process long sequences efficiently. In this work, we focus on designing an efficient segment-anything model by exploring these different architectures. Specifica… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 16 pages; 8 figures

  3. arXiv:2406.16422  [pdf, other

    cs.CV cs.AI

    Exploring Cross-Domain Few-Shot Classification via Frequency-Aware Prompting

    Authors: Tiange Zhang, Qing Cai, Feng Gao, Lin Qi, Junyu Dong

    Abstract: Cross-Domain Few-Shot Learning has witnessed great stride with the development of meta-learning. However, most existing methods pay more attention to learning domain-adaptive inductive bias (meta-knowledge) through feature-wise manipulation or task diversity improvement while neglecting the phenomenon that deep networks tend to rely more on high-frequency cues to make the classification decision,… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  4. arXiv:2406.12225  [pdf, other

    cs.CV

    The Solution for CVPR2024 Foundational Few-Shot Object Detection Challenge

    Authors: Hongpeng Pan, Shifeng Yi, Shouwei Yang, Lei Qi, Bing Hu, Yi Xu, Yang Yang

    Abstract: This report introduces an enhanced method for the Foundational Few-Shot Object Detection (FSOD) task, leveraging the vision-language model (VLM) for object detection. However, on specific datasets, VLM may encounter the problem where the detected targets are misaligned with the target concepts of interest. This misalignment hinders the zero-shot performance of VLM and the application of fine-tunin… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: CVPR2024 Foundational Few-Shot Object Detection Challenge

  5. arXiv:2406.05603  [pdf, other

    cs.CY cs.AI

    A Knowledge-Component-Based Methodology for Evaluating AI Assistants

    Authors: Laryn Qi, J. D. Zamfirescu-Pereira, Taehan Kim, Björn Hartmann, John DeNero, Narges Norouzi

    Abstract: We evaluate an automatic hint generator for CS1 programming assignments powered by GPT-4, a large language model. This system provides natural language guidance about how students can improve their incorrect solutions to short programming exercises. A hint can be requested each time a student fails a test case. Our evaluation addresses three Research Questions: RQ1: Do the hints help students im… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  6. arXiv:2406.05600  [pdf, other

    cs.CY

    61A-Bot: AI homework assistance in CS1 is fast and cheap -- but is it helpful?

    Authors: J. D. Zamfirescu-Pereira, Laryn Qi, Björn Hartmann, John DeNero, Narges Norouzi

    Abstract: Chatbot interfaces for LLMs enable students to get immediate, interactive help on homework assignments, but even a thoughtfully-designed bot may not serve all pedagogical goals. In this paper, we report on the development and deployment of a GPT-4-based interactive homework assistant ("61A-Bot") for students in a large CS1 course; over 2000 students made over 100,000 requests of our bot across two… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: 6 pages, 3 figures, 1 table, 1 page of references

  7. arXiv:2406.00121  [pdf, other

    cs.CV

    Empowering Visual Creativity: A Vision-Language Assistant to Image Editing Recommendations

    Authors: Tiancheng Shen, Jun Hao Liew, Long Mai, Lu Qi, Jiashi Feng, Jiaya Jia

    Abstract: Advances in text-based image generation and editing have revolutionized content creation, enabling users to create impressive content from imaginative text prompts. However, existing methods are not designed to work well with the oversimplified prompts that are often encountered in typical scenarios when users start their editing with only vague or abstract purposes in mind. Those scenarios demand… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

  8. arXiv:2405.20282  [pdf, other

    cs.CV

    SemFlow: Binding Semantic Segmentation and Image Synthesis via Rectified Flow

    Authors: Chaoyang Wang, Xiangtai Li, Lu Qi, Henghui Ding, Yunhai Tong, Ming-Hsuan Yang

    Abstract: Semantic segmentation and semantic image synthesis are two representative tasks in visual perception and generation. While existing methods consider them as two distinct tasks, we propose a unified diffusion-based framework (SemFlow) and model them as a pair of reverse problems. Specifically, motivated by rectified flow theory, we train an ordinary differential equation (ODE) model to transport be… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  9. arXiv:2405.17427  [pdf, other

    cs.CV

    Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model

    Authors: Kuan-Chih Huang, Xiangtai Li, Lu Qi, Shuicheng Yan, Ming-Hsuan Yang

    Abstract: Recent advancements in multimodal large language models (LLMs) have shown their potential in various domains, especially concept reasoning. Despite these developments, applications in understanding 3D environments remain limited. This paper introduces Reason3D, a novel LLM designed for comprehensive 3D understanding. Reason3D takes point cloud data and text prompts as input to produce textual resp… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Project Page: https://KuanchihHuang.github.io/project/reason3d

  10. arXiv:2405.07018  [pdf, other

    cs.CR

    Shadow-Free Membership Inference Attacks: Recommender Systems Are More Vulnerable Than You Thought

    Authors: Xiaoxiao Chi, Xuyun Zhang, Yan Wang, Lianyong Qi, Amin Beheshti, Xiaolong Xu, Kim-Kwang Raymond Choo, Shuo Wang, Hongsheng Hu

    Abstract: Recommender systems have been successfully applied in many applications. Nonetheless, recent studies demonstrate that recommender systems are vulnerable to membership inference attacks (MIAs), leading to the leakage of users' membership privacy. However, existing MIAs relying on shadow training suffer a large performance drop when the attacker lacks knowledge of the training data distribution and… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

    Comments: This paper has been accepted by IJCAI-24

  11. arXiv:2404.18560  [pdf, other

    math.OC cs.RO

    Non-convex Pose Graph Optimization in SLAM via Proximal Linearized Riemannian ADMM

    Authors: Xin Chen, Chunfeng Cui, Deren Han, Liqun Qi

    Abstract: Pose graph optimization (PGO) is a well-known technique for solving the pose-based simultaneous localization and map** (SLAM) problem. In this paper, we represent the rotation and translation by a unit quaternion and a three-dimensional vector, and propose a new PGO model based on the von Mises-Fisher distribution. The constraints derived from the unit quaternions are spherical manifolds, and th… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  12. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  13. arXiv:2404.08951  [pdf, other

    cs.CV cs.LG

    Constructing and Exploring Intermediate Domains in Mixed Domain Semi-supervised Medical Image Segmentation

    Authors: Qinghe Ma, Jian Zhang, Lei Qi, Qian Yu, Yinghuan Shi, Yang Gao

    Abstract: Both limited annotation and domain shift are prevalent challenges in medical image segmentation. Traditional semi-supervised segmentation and unsupervised domain adaptation methods address one of these issues separately. However, the coexistence of limited annotation and domain shift is quite common, which motivates us to introduce a novel and challenging scenario: Mixed Domain Semi-supervised med… ▽ More

    Submitted 13 April, 2024; originally announced April 2024.

  14. RMAFF-PSN: A Residual Multi-Scale Attention Feature Fusion Photometric Stereo Network

    Authors: Kai Luo, Yakun Ju, Lin Qi, Kaixuan Wang, Junyu Dong

    Abstract: Predicting accurate normal maps of objects from two-dimensional images in regions of complex structure and spatial material variations is challenging using photometric stereo methods due to the influence of surface reflection properties caused by variations in object geometry and surface materials. To address this issue, we propose a photometric stereo network called a RMAFF-PSN that uses residual… ▽ More

    Submitted 14 April, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

    Comments: 17 pages,12 figures

    Journal ref: Photonics 2023,10(5),548

  15. arXiv:2403.19539  [pdf, other

    cs.CV

    De-confounded Data-free Knowledge Distillation for Handling Distribution Shifts

    Authors: Yuzheng Wang, Dingkang Yang, Zhaoyu Chen, Yang Liu, Siao Liu, Wenqiang Zhang, Lihua Zhang, Lizhe Qi

    Abstract: Data-Free Knowledge Distillation (DFKD) is a promising task to train high-performance small models to enhance actual deployment without relying on the original training data. Existing methods commonly avoid relying on private data by utilizing synthetic or sampled data. However, a long-overlooked issue is that the severe distribution shifts between their substitution and original data, which manif… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR24

  16. arXiv:2403.19213  [pdf, other

    cs.CV

    Learning Multiple Representations with Inconsistency-Guided Detail Regularization for Mask-Guided Matting

    Authors: Weihao Jiang, Zhaozhi Xie, Yuxiang Lu, Longjie Qi, **gyong Cai, Hiroyuki Uchiyama, Bin Chen, Yue Ding, Hongtao Lu

    Abstract: Mask-guided matting networks have achieved significant improvements and have shown great potential in practical applications in recent years. However, simply learning matting representation from synthetic and lack-of-real-world-diversity matting data, these approaches tend to overfit low-level details in wrong regions, lack generalization to objects with complex structures and real-world scenes su… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  17. arXiv:2403.16697  [pdf, other

    cs.CV

    DPStyler: Dynamic PromptStyler for Source-Free Domain Generalization

    Authors: Yunlong Tang, Yuxuan Wan, Lei Qi, Xin Geng

    Abstract: Source-Free Domain Generalization (SFDG) aims to develop a model that works for unseen target domains without relying on any source domain. Recent work, PromptStyler, employs text prompts to simulate different distribution shifts in the joint vision-language space, allowing the model to generalize effectively to unseen domains without using any images. However, 1) PromptStyler's style generation s… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  18. arXiv:2403.11792  [pdf, other

    cs.CV

    SETA: Semantic-Aware Token Augmentation for Domain Generalization

    Authors: **tao Guo, Lei Qi, Yinghuan Shi, Yang Gao

    Abstract: Domain generalization (DG) aims to enhance the model robustness against domain shifts without accessing target domains. A prevalent category of methods for DG is data augmentation, which focuses on generating virtual samples to simulate domain shifts. However, existing augmentation techniques in DG are mainly tailored for convolutional neural networks (CNNs), with limited exploration in token-base… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: 13 pages, 6 figures

  19. arXiv:2403.11229  [pdf, other

    cs.CV

    Concatenate, Fine-tuning, Re-training: A SAM-enabled Framework for Semi-supervised 3D Medical Image Segmentation

    Authors: Shumeng Li, Lei Qi, Qian Yu, **g Huo, Yinghuan Shi, Yang Gao

    Abstract: Segment Anything Model (SAM) fine-tuning has shown remarkable performance in medical image segmentation in a fully supervised manner, but requires precise annotations. To reduce the annotation cost and maintain satisfactory performance, in this work, we leverage the capabilities of SAM for establishing semi-supervised medical image segmentation models. Rethinking the requirements of effectiveness,… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  20. arXiv:2403.09616  [pdf, other

    cs.CV

    Explore In-Context Segmentation via Latent Diffusion Models

    Authors: Chaoyang Wang, Xiangtai Li, Henghui Ding, Lu Qi, Jiangning Zhang, Yunhai Tong, Chen Change Loy, Shuicheng Yan

    Abstract: In-context segmentation has drawn more attention with the introduction of vision foundation models. Most existing approaches adopt metric learning or masked image modeling to build the correlation between visual prompts and input image queries. In this work, we explore this problem from a new perspective, using one representative generation model, the latent diffusion model (LDM). We observe a tas… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  21. arXiv:2402.19145  [pdf, other

    cs.CV

    A SAM-guided Two-stream Lightweight Model for Anomaly Detection

    Authors: Chenghao Li, Lei Qi, Xin Geng

    Abstract: In industrial anomaly detection, model efficiency and mobile-friendliness become the primary concerns in real-world applications. Simultaneously, the impressive generalization capabilities of Segment Anything (SAM) have garnered broad academic attention, making it an ideal choice for localizing unseen anomalies and diverse real-world patterns. In this paper, considering these two critical factors,… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  22. arXiv:2402.14008  [pdf, other

    cs.CL

    OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems

    Authors: Chaoqun He, Renjie Luo, Yuzhuo Bai, Shengding Hu, Zhen Leng Thai, Junhao Shen, **yi Hu, Xu Han, Yujie Huang, Yuxiang Zhang, Jie Liu, Lei Qi, Zhiyuan Liu, Maosong Sun

    Abstract: Recent advancements have seen Large Language Models (LLMs) and Large Multimodal Models (LMMs) surpassing general human capabilities in various tasks, approaching the proficiency level of human experts across multiple domains. With traditional benchmarks becoming less challenging for these models, new rigorous challenges are essential to gauge their advanced abilities. In this work, we present Olym… ▽ More

    Submitted 6 June, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

    Comments: Accepted by ACL 2024 (main), update

  23. arXiv:2402.10821  [pdf, other

    cs.CV

    Training Class-Imbalanced Diffusion Model Via Overlap Optimization

    Authors: Divin Yan, Lu Qi, Vincent Tao Hu, Ming-Hsuan Yang, Meng Tang

    Abstract: Diffusion models have made significant advances recently in high-quality image synthesis and related tasks. However, diffusion models trained on real-world datasets, which often follow long-tailed distributions, yield inferior fidelity for tail classes. Deep generative models, including diffusion models, are biased towards classes with abundant training images. To address the observed appearance o… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Comments: Technique Report

  24. arXiv:2402.02555  [pdf, other

    cs.CV cs.CL

    Generalizable Entity Grounding via Assistance of Large Language Model

    Authors: Lu Qi, Yi-Wen Chen, Lehan Yang, Tiancheng Shen, Xiangtai Li, Weidong Guo, Yu Xu, Ming-Hsuan Yang

    Abstract: In this work, we propose a novel approach to densely ground visual entities from a long caption. We leverage a large multimodal model (LMM) to extract semantic nouns, a class-agnostic segmentation model to generate entity-level segmentation, and the proposed multi-modal feature fusion module to associate each semantic noun with its corresponding segmentation mask. Additionally, we introduce a stra… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

  25. arXiv:2401.10228  [pdf, other

    cs.CV

    RAP-SAM: Towards Real-Time All-Purpose Segment Anything

    Authors: Shilin Xu, Haobo Yuan, Qingyu Shi, Lu Qi, **gbo Wang, Yibo Yang, Yining Li, Kai Chen, Yunhai Tong, Bernard Ghanem, Xiangtai Li, Ming-Hsuan Yang

    Abstract: Advanced by transformer architecture, vision foundation models (VFMs) achieve remarkable progress in performance and generalization ability. Segment Anything Model (SAM) is one remarkable model that can achieve generalized segmentation. However, most VFMs cannot run in realtime, which makes it difficult to transfer them into several products. On the other hand, current real-time segmentation mainl… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

    Comments: Project Page: https://xushilin1.github.io/rap_sam/

  26. arXiv:2401.05752  [pdf, other

    cs.CV

    Learning Generalizable Models via Disentangling Spurious and Enhancing Potential Correlations

    Authors: Na Wang, Lei Qi, **tao Guo, Yinghuan Shi, Yang Gao

    Abstract: Domain generalization (DG) intends to train a model on multiple source domains to ensure that it can generalize well to an arbitrary unseen target domain. The acquisition of domain-invariant representations is pivotal for DG as they possess the ability to capture the inherent semantic information of the data, mitigate the influence of domain shift, and enhance the generalization capability of the… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

  27. arXiv:2312.16983  [pdf, other

    cs.LG cs.AI

    PG-LBO: Enhancing High-Dimensional Bayesian Optimization with Pseudo-Label and Gaussian Process Guidance

    Authors: Taicai Chen, Yue Duan, Dong Li, Lei Qi, Yinghuan Shi, Yang Gao

    Abstract: Variational Autoencoder based Bayesian Optimization (VAE-BO) has demonstrated its excellent performance in addressing high-dimensional structured optimization problems. However, current mainstream methods overlook the potential of utilizing a pool of unlabeled data to construct the latent space, while only concentrating on designing sophisticated models to leverage the labeled data. Despite their… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI 2024

  28. arXiv:2312.12237  [pdf, other

    cs.LG

    Roll With the Punches: Expansion and Shrinkage of Soft Label Selection for Semi-supervised Fine-Grained Learning

    Authors: Yue Duan, Zhen Zhao, Lei Qi, Lu** Zhou, Lei Wang, Yinghuan Shi

    Abstract: While semi-supervised learning (SSL) has yielded promising results, the more realistic SSL scenario remains to be explored, in which the unlabeled data exhibits extremely high recognition difficulty, e.g., fine-grained visual classification in the context of SSL (SS-FGVC). The increased recognition difficulty on fine-grained unlabeled data spells disaster for pseudo-labeling accuracy, resulting in… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI 2024

  29. arXiv:2312.06630  [pdf, other

    cs.CV

    TMT-VIS: Taxonomy-aware Multi-dataset Joint Training for Video Instance Segmentation

    Authors: Rongkun Zheng, Lu Qi, Xi Chen, Yi Wang, Kun Wang, Yu Qiao, Hengshuang Zhao

    Abstract: Training on large-scale datasets can boost the performance of video instance segmentation while the annotated datasets for VIS are hard to scale up due to the high labor cost. What we possess are numerous isolated filed-specific datasets, thus, it is appealing to jointly train models across the aggregation of datasets to enhance data volume and diversity. However, due to the heterogeneity in categ… ▽ More

    Submitted 17 March, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

    Comments: NeurIPS 2023

  30. arXiv:2312.05538  [pdf, other

    cs.CV

    CSL: Class-Agnostic Structure-Constrained Learning for Segmentation Including the Unseen

    Authors: Hao Zhang, Fang Li, Lu Qi, Ming-Hsuan Yang, Narendra Ahuja

    Abstract: Addressing Out-Of-Distribution (OOD) Segmentation and Zero-Shot Semantic Segmentation (ZS3) is challenging, necessitating segmenting unseen classes. Existing strategies adapt the class-agnostic Mask2Former (CA-M2F) tailored to specific tasks. However, these methods cater to singular tasks, demand training from scratch, and we demonstrate certain deficiencies in CA-M2F, which affect performance. We… ▽ More

    Submitted 8 February, 2024; v1 submitted 9 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI 2024

  31. arXiv:2312.02697  [pdf, other

    cs.RO

    Hierarchical Visual Policy Learning for Long-Horizon Robot Manipulation in Densely Cluttered Scenes

    Authors: Hecheng Wang, Lizhe Qi, Bin Fang, Yunquan Sun

    Abstract: In this work, we focus on addressing the long-horizon manipulation tasks in densely cluttered scenes. Such tasks require policies to effectively manage severe occlusions among objects and continually produce actions based on visual observations. We propose a vision-based Hierarchical policy for Cluttered-scene Long-horizon Manipulation (HCLM). It employs a high-level policy and three options to se… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

  32. arXiv:2312.01985  [pdf, other

    cs.CV

    UniGS: Unified Representation for Image Generation and Segmentation

    Authors: Lu Qi, Lehan Yang, Weidong Guo, Yu Xu, Bo Du, Varun Jampani, Ming-Hsuan Yang

    Abstract: This paper introduces a novel unified representation of diffusion models for image generation and segmentation. Specifically, we use a colormap to represent entity-level masks, addressing the challenge of varying entity numbers while aligning the representation closely with the image RGB domain. Two novel modules, including the location-aware color palette and progressive dichotomy module, are pro… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

  33. arXiv:2312.01734  [pdf, other

    cs.CV

    Effective Adapter for Face Recognition in the Wild

    Authors: Yunhao Liu, Yu-Ju Tsai, Kelvin C. K. Chan, Xiangtai Li, Lu Qi, Ming-Hsuan Yang

    Abstract: In this paper, we tackle the challenge of face recognition in the wild, where images often suffer from low quality and real-world distortions. Traditional heuristic approaches-either training models directly on these degraded images or their enhanced counterparts using face restoration techniques-have proven ineffective, primarily due to the degradation of facial features and the discrepancy in im… ▽ More

    Submitted 3 April, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

  34. arXiv:2312.01677   

    cs.CV

    Multi-task Image Restoration Guided By Robust DINO Features

    Authors: Xin Lin, Chao Ren, Kelvin C. K. Chan, Lu Qi, **shan Pan, Ming-Hsuan Yang

    Abstract: Multi-task image restoration has gained significant interest due to its inherent versatility and efficiency compared to its single-task counterpart. Despite its potential, performance degradation is observed with an increase in the number of tasks, primarily attributed to the distinct nature of each restoration task. Addressing this challenge, we introduce \mbox{\textbf{DINO-IR}}, a novel multi-ta… ▽ More

    Submitted 5 December, 2023; v1 submitted 4 December, 2023; originally announced December 2023.

    Comments: Some important information need to add

  35. arXiv:2311.17121  [pdf, other

    cs.CV cs.LG

    ScribbleGen: Generative Data Augmentation Improves Scribble-supervised Semantic Segmentation

    Authors: Jacob Schnell, Jieke Wang, Lu Qi, Vincent Tao Hu, Meng Tang

    Abstract: Recent advances in generative models, such as diffusion models, have made generating high-quality synthetic images widely accessible. Prior works have shown that training on synthetic images improves many perception tasks, such as image classification, object detection, and semantic segmentation. We are the first to explore generative data augmentations for scribble-supervised semantic segmentatio… ▽ More

    Submitted 16 April, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

  36. arXiv:2311.16556  [pdf, other

    cs.LG

    Scalable Label Distribution Learning for Multi-Label Classification

    Authors: Xingyu Zhao, Yuexuan An, Lei Qi, Xin Geng

    Abstract: Multi-label classification (MLC) refers to the problem of tagging a given instance with a set of relevant labels. Most existing MLC methods are based on the assumption that the correlation of two labels in each label pair is symmetric, which is violated in many real-world scenarios. Moreover, most existing methods design learning processes associated with the number of labels, which makes their co… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

  37. arXiv:2311.13198  [pdf, other

    cs.CV

    DoubleAUG: Single-domain Generalized Object Detector in Urban via Color Perturbation and Dual-style Memory

    Authors: Lei Qi, Peng Dong, Tan Xiong, Hui Xue, Xin Geng

    Abstract: Object detection in urban scenarios is crucial for autonomous driving in intelligent traffic systems. However, unlike conventional object detection tasks, urban-scene images vary greatly in style. For example, images taken on sunny days differ significantly from those taken on rainy days. Therefore, models trained on sunny day images may not generalize well to rainy day images. In this paper, we a… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

    Comments: Accepted by ACM Transactions on Multimedia Computing, Communications, and Applications

  38. arXiv:2311.12085  [pdf, other

    cs.CV

    Pyramid Diffusion for Fine 3D Large Scene Generation

    Authors: Yuheng Liu, Xinke Li, Xueting Li, Lu Qi, Chongshou Li, Ming-Hsuan Yang

    Abstract: Directly transferring the 2D techniques to 3D scene generation is challenging due to significant resolution reduction and the scarcity of comprehensive real-world 3D scene datasets. To address these issues, our work introduces the Pyramid Discrete Diffusion model (PDD) for 3D scene generation. This novel approach employs a multi-scale model capable of progressively generating high-quality 3D scene… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

    Comments: Project page: https://yuheng.ink/project-page/pyramid-discrete-diffusion/

  39. arXiv:2311.03352  [pdf, other

    cs.CV

    Rethinking Evaluation Metrics of Open-Vocabulary Segmentaion

    Authors: Hao Zhou, Tiancheng Shen, Xu Yang, Hai Huang, Xiangtai Li, Lu Qi, Ming-Hsuan Yang

    Abstract: In this paper, we highlight a problem of evaluation metrics adopted in the open-vocabulary segmentation. That is, the evaluation process still heavily relies on closed-set metrics on zero-shot or cross-dataset pipelines without considering the similarity between predicted and ground truth categories. To tackle this issue, we first survey eleven similarity measurements between two categorical words… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

  40. arXiv:2310.17260  [pdf, other

    cs.CY

    Socially Beneficial Metaverse: Framework, Technologies, Applications, and Challenges

    Authors: Xiaolong Xu, Xuanhong Zhou, Muhammad Bilal, Sherali Zeadally, Jon Crowcroft, Lianyong Qi, Shengjun Xue

    Abstract: In recent years, the maturation of emerging technologies such as Virtual Reality, Digital twins, and Blockchain has accelerated the realization of the metaverse. As a virtual world independent of the real world, the metaverse will provide users with a variety of virtual activities that bring great convenience to society. In addition, the metaverse can facilitate digital twins, which offers transfo… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

    Comments: 28 pages, 6 figures, 3 tables

    MSC Class: 68U01; 68M11; 68U35 ACM Class: A.1; K.4

  41. arXiv:2309.16140  [pdf, other

    cs.MM cs.CV

    CLIP-Hand3D: Exploiting 3D Hand Pose Estimation via Context-Aware Prompting

    Authors: Shaoxiang Guo, Qing Cai, Lin Qi, Junyu Dong

    Abstract: Contrastive Language-Image Pre-training (CLIP) starts to emerge in many computer vision tasks and has achieved promising performance. However, it remains underexplored whether CLIP can be generalized to 3D hand pose estimation, as bridging text prompts with pose-aware features presents significant challenges due to the discrete nature of joint positions in 3D space. In this paper, we make one of t… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

    Comments: Accepted In Proceedings of the 31st ACM International Conference on Multimedia (MM' 23)

  42. arXiv:2309.06337  [pdf, other

    cs.CV

    Exploring Flat Minima for Domain Generalization with Large Learning Rates

    Authors: Jian Zhang, Lei Qi, Yinghuan Shi, Yang Gao

    Abstract: Domain Generalization (DG) aims to generalize to arbitrary unseen domains. A promising approach to improve model generalization in DG is the identification of flat minima. One typical method for this task is SWAD, which involves averaging weights along the training trajectory. However, the success of weight averaging depends on the diversity of weights, which is limited when training with a small… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

  43. arXiv:2309.03598  [pdf, other

    cs.CV

    Enhancing Sample Utilization through Sample Adaptive Augmentation in Semi-Supervised Learning

    Authors: Guan Gui, Zhen Zhao, Lei Qi, Lu** Zhou, Lei Wang, Yinghuan Shi

    Abstract: In semi-supervised learning, unlabeled samples can be utilized through augmentation and consistency regularization. However, we observed certain samples, even undergoing strong augmentation, are still correctly classified with high confidence, resulting in a loss close to zero. It indicates that these samples have been already learned well and do not provide any additional optimization benefits to… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

    Comments: Accepted as International Conference on Computer Vision (ICCV) 2023

    Journal ref: International Conference on Computer Vision (ICCV) 2023

  44. arXiv:2309.03004  [pdf, other

    cs.LG

    A Theoretical Explanation of Activation Sparsity through Flat Minima and Adversarial Robustness

    Authors: Ze Peng, Lei Qi, Yinghuan Shi, Yang Gao

    Abstract: A recent empirical observation (Li et al., 2022b) of activation sparsity in MLP blocks offers an opportunity to drastically reduce computation costs for free. Although having attributed it to training dynamics, existing theoretical explanations of activation sparsity are restricted to shallow networks, small training steps and special training, despite its emergence in deep models standardly train… ▽ More

    Submitted 26 October, 2023; v1 submitted 6 September, 2023; originally announced September 2023.

  45. arXiv:2308.13168  [pdf, other

    cs.CV cs.LG

    IOMatch: Simplifying Open-Set Semi-Supervised Learning with Joint Inliers and Outliers Utilization

    Authors: Zekun Li, Lei Qi, Yinghuan Shi, Yang Gao

    Abstract: Semi-supervised learning (SSL) aims to leverage massive unlabeled data when labels are expensive to obtain. Unfortunately, in many real-world applications, the collected unlabeled data will inevitably contain unseen-class outliers not belonging to any of the labeled classes. To deal with the challenging open-set SSL task, the mainstream methods tend to first detect outliers and then filter them ou… ▽ More

    Submitted 25 August, 2023; originally announced August 2023.

    Comments: Accepted by ICCV 2023, selected for an Oral presentation

  46. arXiv:2308.10297  [pdf, other

    cs.CV

    DomainAdaptor: A Novel Approach to Test-time Adaptation

    Authors: Jian Zhang, Lei Qi, Yinghuan Shi, Yang Gao

    Abstract: To deal with the domain shift between training and test samples, current methods have primarily focused on learning generalizable features during training and ignore the specificity of unseen samples that are also critical during the test. In this paper, we investigate a more challenging task that aims to adapt a trained CNN model to unseen domains during the test. To maximumly mine the informatio… ▽ More

    Submitted 20 August, 2023; originally announced August 2023.

    Comments: Accepted by ICCV2023

  47. arXiv:2308.10285  [pdf, other

    cs.CV

    DomainDrop: Suppressing Domain-Sensitive Channels for Domain Generalization

    Authors: **tao Guo, Lei Qi, Yinghuan Shi

    Abstract: Deep Neural Networks have exhibited considerable success in various visual tasks. However, when applied to unseen test datasets, state-of-the-art models often suffer performance degradation due to domain shifts. In this paper, we introduce a novel approach for domain generalization from a novel perspective of enhancing the robustness of channels in feature maps to domain shifts. We observe that mo… ▽ More

    Submitted 20 August, 2023; originally announced August 2023.

    Comments: Accepted by ICCV2023. The code is available at https://github.com/lingeringlight/DomainDrop

  48. arXiv:2308.09391  [pdf, other

    cs.CV

    Generalizable Decision Boundaries: Dualistic Meta-Learning for Open Set Domain Generalization

    Authors: Xiran Wang, Jian Zhang, Lei Qi, Yinghuan Shi

    Abstract: Domain generalization (DG) is proposed to deal with the issue of domain shift, which occurs when statistical differences exist between source and target domains. However, most current methods do not account for a common realistic scenario where the source and target domains have different classes. To overcome this deficiency, open set domain generalization (OSDG) then emerges as a more practical s… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

    Comments: 10 pages, 5 figures, accepted by ICCV2023

  49. arXiv:2308.08872  [pdf, other

    cs.LG cs.CV

    Towards Semi-supervised Learning with Non-random Missing Labels

    Authors: Yue Duan, Zhen Zhao, Lei Qi, Lu** Zhou, Lei Wang, Yinghuan Shi

    Abstract: Semi-supervised learning (SSL) tackles the label missing problem by enabling the effective usage of unlabeled data. While existing SSL methods focus on the traditional setting, a practical and challenging scenario called label Missing Not At Random (MNAR) is usually ignored. In MNAR, the labeled and unlabeled data fall into different class distributions resulting in biased label imputation, which… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

    Comments: Accepted by ICCV 2023

  50. arXiv:2308.07314  [pdf, other

    cs.CV

    Dual Associated Encoder for Face Restoration

    Authors: Yu-Ju Tsai, Yu-Lun Liu, Lu Qi, Kelvin C. K. Chan, Ming-Hsuan Yang

    Abstract: Restoring facial details from low-quality (LQ) images has remained a challenging problem due to its ill-posedness induced by various degradations in the wild. The existing codebook prior mitigates the ill-posedness by leveraging an autoencoder and learned codebook of high-quality (HQ) features, achieving remarkable quality. However, existing approaches in this paradigm frequently depend on a singl… ▽ More

    Submitted 20 January, 2024; v1 submitted 14 August, 2023; originally announced August 2023.

    Comments: ICLR 2024, Project page: https://liagm.github.io/DAEFR/