Skip to main content

Showing 1–50 of 60 results for author: Qian, Q

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01219  [pdf, other

    cs.CL

    Searching for Best Practices in Retrieval-Augmented Generation

    Authors: Xiaohua Wang, Zhenghua Wang, Xuan Gao, Feiran Zhang, Yixin Wu, Zhibo Xu, Tianyuan Shi, Zhengyuan Wang, Shizheng Li, Qi Qian, Ruicheng Yin, Changze Lv, Xiaoqing Zheng, Xuan**g Huang

    Abstract: Retrieval-augmented generation (RAG) techniques have proven to be effective in integrating up-to-date information, mitigating hallucinations, and enhancing response quality, particularly in specialized domains. While many RAG approaches have been proposed to enhance large language models through query-dependent retrievals, these approaches still suffer from their complex implementation and prolong… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2407.00608  [pdf, other

    cs.AI cs.CL cs.CV

    Efficient Personalized Text-to-image Generation by Leveraging Textual Subspace

    Authors: Shian Du, Xiaotian Cheng, Qi Qian, Henglu Wei, Yi Xu, Xiangyang Ji

    Abstract: Personalized text-to-image generation has attracted unprecedented attention in the recent few years due to its unique capability of generating highly-personalized images via using the input concept dataset and novel textual prompt. However, previous methods solely focus on the performance of the reconstruction task, degrading its ability to combine with different textual prompt. Besides, optimizin… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  3. arXiv:2404.15655  [pdf, other

    cs.CV

    Multi-Modal Proxy Learning Towards Personalized Visual Multiple Clustering

    Authors: Jiawei Yao, Qi Qian, Juhua Hu

    Abstract: Multiple clustering has gained significant attention in recent years due to its potential to reveal multiple hidden structures of data from different perspectives. The advent of deep multiple clustering techniques has notably advanced the performance by uncovering complex patterns and relationships within large datasets. However, a major challenge arises as users often do not need all the clusteri… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024. Project page: https://github.com/Alexander-Yao/Multi-MaP

  4. arXiv:2401.06040  [pdf, other

    cs.LG

    Wavelet-Inspired Multiscale Graph Convolutional Recurrent Network for Traffic Forecasting

    Authors: Qipeng Qian, Tanwi Mallick

    Abstract: Traffic forecasting is the foundation for intelligent transportation systems. Spatiotemporal graph neural networks have demonstrated state-of-the-art performance in traffic forecasting. However, these methods do not explicitly model some of the natural characteristics in traffic data, such as the multiscale structure that encompasses spatial and temporal variations at different levels of granulari… ▽ More

    Submitted 4 March, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

  5. arXiv:2311.18248  [pdf, other

    cs.MM cs.CL

    mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model

    Authors: Anwen Hu, Yaya Shi, Haiyang Xu, Jiabo Ye, Qinghao Ye, Ming Yan, Chenliang Li, Qi Qian, Ji Zhang, Fei Huang

    Abstract: Recently, the strong text creation ability of Large Language Models(LLMs) has given rise to many tools for assisting paper reading or even writing. However, the weak diagram analysis abilities of LLMs or Multimodal LLMs greatly limit their application scenarios, especially for scientific academic paper writing. In this work, towards a more versatile copilot for academic paper writing, we mainly fo… ▽ More

    Submitted 9 January, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

    Comments: 20 pages, 12 figures

  6. arXiv:2311.15800  [pdf

    cs.CY

    Public sentiment analysis and topic modeling regarding ChatGPT in mental health on Reddit: Negative sentiments increase over time

    Authors: Yunna Cai, Fan Wang, Haowei Wang, Qianwen Qian

    Abstract: In order to uncover users' attitudes towards ChatGPT in mental health, this study examines public opinions about ChatGPT in mental health discussions on Reddit. Researchers used the bert-base-multilingual-uncased-sentiment techniques for sentiment analysis and the BERTopic model for topic modeling. It was found that overall, negative sentiments prevail, followed by positive ones, with neutral sent… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: 11 pages.8 figures, 2 tables

  7. arXiv:2311.14310  [pdf, other

    cs.CV

    Stable Cluster Discrimination for Deep Clustering

    Authors: Qi Qian

    Abstract: Deep clustering can optimize representations of instances (i.e., representation learning) and explore the inherent data distribution (i.e., clustering) simultaneously, which demonstrates a superior performance over conventional clustering methods with given features. However, the coupled objective implies a trivial solution that all instances collapse to the uniform features. To tackle the challen… ▽ More

    Submitted 24 November, 2023; originally announced November 2023.

    Comments: accepted by ICCV'23

  8. arXiv:2311.07577  [pdf, ps, other

    cs.CV eess.IV

    Algorithms for Object Detection in Substations

    Authors: Bingying **, Yadong Liu, Qinlin Qian

    Abstract: Inspection of high-voltage power equipment is an effective way to ensure power supply reliability. Object recognition, one of the key technologies in automatic power equipment inspection, attracts attention of many researchers and engineers. Although quite a few existing models have some their own advantages, object relationship between equipment which is very important in this task is scarcely co… ▽ More

    Submitted 23 September, 2023; originally announced November 2023.

  9. arXiv:2311.04257  [pdf, other

    cs.CL cs.CV

    mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration

    Authors: Qinghao Ye, Haiyang Xu, Jiabo Ye, Ming Yan, Anwen Hu, Haowei Liu, Qi Qian, Ji Zhang, Fei Huang, **gren Zhou

    Abstract: Multi-modal Large Language Models (MLLMs) have demonstrated impressive instruction abilities across various open-ended tasks. However, previous methods primarily focus on enhancing multi-modal capabilities. In this work, we introduce a versatile multi-modal large language model, mPLUG-Owl2, which effectively leverages modality collaboration to improve performance in both text and multi-modal tasks… ▽ More

    Submitted 8 November, 2023; v1 submitted 7 November, 2023; originally announced November 2023.

  10. arXiv:2310.19752  [pdf, other

    cs.CV cs.LG

    Intra-Modal Proxy Learning for Zero-Shot Visual Categorization with CLIP

    Authors: Qi Qian, Yuanhong Xu, Juhua Hu

    Abstract: Vision-language pre-training methods, e.g., CLIP, demonstrate an impressive zero-shot performance on visual categorizations with the class proxy from the text embedding of the class name. However, the modality gap between the text and vision space can result in a sub-optimal performance. We theoretically show that the gap cannot be reduced sufficiently by minimizing the contrastive loss in CLIP an… ▽ More

    Submitted 30 October, 2023; originally announced October 2023.

    Comments: accepted by NeurIPS'23

  11. arXiv:2310.05126  [pdf, other

    cs.CV cs.AI

    UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model

    Authors: Jiabo Ye, Anwen Hu, Haiyang Xu, Qinghao Ye, Ming Yan, Guohai Xu, Chenliang Li, Junfeng Tian, Qi Qian, Ji Zhang, Qin **, Liang He, Xin Alex Lin, Fei Huang

    Abstract: Text is ubiquitous in our visual world, conveying crucial information, such as in documents, websites, and everyday photographs. In this work, we propose UReader, a first exploration of universal OCR-free visually-situated language understanding based on the Multimodal Large Language Model (MLLM). By leveraging the shallow text recognition ability of the MLLM, we only finetuned 1.2% parameters and… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

  12. arXiv:2310.04257  [pdf, other

    cs.NE cs.RO

    On Solving Close Enough Orienteering Problems with Overlapped Neighborhoods

    Authors: Qiuchen Qian, Yanran Wang, David Boyle

    Abstract: Close Enough Traveling Salesman Problem (CETSP) is a well-known variant of TSP whereby the agent may complete its mission at any point within a target neighborhood. Heuristics based on overlapped neighborhoods, known as Steiner Zones (SZ), have gained attention in addressing CETSP. While SZs offer effective approximations to the original graph, their inherent overlap imposes constraints on search… ▽ More

    Submitted 15 May, 2024; v1 submitted 6 October, 2023; originally announced October 2023.

    Comments: 30 pages, 11 figures

  13. arXiv:2309.04145  [pdf, other

    cs.CV

    Depth Completion with Multiple Balanced Bases and Confidence for Dense Monocular SLAM

    Authors: Weijian Xie, Guanyi Chu, Quanhao Qian, Yihao Yu, Hai Li, Danpeng Chen, Shang** Zhai, Nan Wang, Hujun Bao, Guofeng Zhang

    Abstract: Dense SLAM based on monocular cameras does indeed have immense application value in the field of AR/VR, especially when it is performed on a mobile device. In this paper, we propose a novel method that integrates a light-weight depth completion network into a sparse SLAM system using a multi-basis depth representation, so that dense map** can be performed online even on a mobile phone. Specifica… ▽ More

    Submitted 20 September, 2023; v1 submitted 8 September, 2023; originally announced September 2023.

  14. arXiv:2307.07084  [pdf, other

    cs.LG cs.AI cs.RO eess.SY

    Probabilistic Constrained Reinforcement Learning with Formal Interpretability

    Authors: Yanran Wang, Qiuchen Qian, David Boyle

    Abstract: Reinforcement learning can provide effective reasoning for sequential decision-making problems with variable dynamics. Such reasoning in practical implementation, however, poses a persistent challenge in interpreting the reward function and the corresponding optimal policy. Consequently, representing sequential decision-making problems as probabilistic inference can have considerable value, as, in… ▽ More

    Submitted 17 June, 2024; v1 submitted 13 July, 2023; originally announced July 2023.

    Comments: 25 pages, 9 figures, containing Appendix

  15. arXiv:2306.08792  [pdf, other

    cs.CV

    Graph Convolution Based Efficient Re-Ranking for Visual Retrieval

    Authors: Yuqi Zhang, Qi Qian, Hongsong Wang, Chong Liu, Weihua Chen, Fan Wang

    Abstract: Visual retrieval tasks such as image retrieval and person re-identification (Re-ID) aim at effectively and thoroughly searching images with similar content or the same identity. After obtaining retrieved examples, re-ranking is a widely adopted post-processing step to reorder and improve the initial retrieval results by making use of the contextual information from semantically neighboring samples… ▽ More

    Submitted 14 June, 2023; originally announced June 2023.

    Comments: Code is publicly available: https://github.com/WesleyZhang1991/GCN_rerank

  16. arXiv:2306.04362  [pdf, other

    cs.CV cs.CL

    Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks

    Authors: Haiyang Xu, Qinghao Ye, Xuan Wu, Ming Yan, Yuan Miao, Jiabo Ye, Guohai Xu, Anwen Hu, Yaya Shi, Guangwei Xu, Chenliang Li, Qi Qian, Maofei Que, Ji Zhang, Xiao Zeng, Fei Huang

    Abstract: To promote the development of Vision-Language Pre-training (VLP) and multimodal Large Language Model (LLM) in the Chinese community, we firstly release the largest public Chinese high-quality video-language dataset named Youku-mPLUG, which is collected from Youku, a well-known Chinese video-sharing website, with strict criteria of safety, diversity, and quality. Youku-mPLUG contains 10 million Chi… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

    Comments: Working in progress

  17. arXiv:2304.14178  [pdf, other

    cs.CL cs.CV cs.LG

    mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality

    Authors: Qinghao Ye, Haiyang Xu, Guohai Xu, Jiabo Ye, Ming Yan, Yiyang Zhou, Junyang Wang, Anwen Hu, Pengcheng Shi, Yaya Shi, Chenliang Li, Yuanhong Xu, Hehong Chen, Junfeng Tian, Qi Qian, Ji Zhang, Fei Huang, **gren Zhou

    Abstract: Large language models (LLMs) have demonstrated impressive zero-shot abilities on a variety of open-ended tasks, while recent research has also explored the use of LLMs for multi-modal generation. In this study, we introduce mPLUG-Owl, a novel training paradigm that equips LLMs with multi-modal abilities through modularized learning of foundation LLM, a visual knowledge module, and a visual abstrac… ▽ More

    Submitted 29 March, 2024; v1 submitted 27 April, 2023; originally announced April 2023.

    Comments: Working in Process

  18. arXiv:2304.07849  [pdf, other

    cs.CL

    ChatPLUG: Open-Domain Generative Dialogue System with Internet-Augmented Instruction Tuning for Digital Human

    Authors: Junfeng Tian, Hehong Chen, Guohai Xu, Ming Yan, Xing Gao, Jianhai Zhang, Chenliang Li, Jiayi Liu, Wenshen Xu, Haiyang Xu, Qi Qian, Wei Wang, Qinghao Ye, Jie**g Zhang, Ji Zhang, Fei Huang, **gren Zhou

    Abstract: In this paper, we present ChatPLUG, a Chinese open-domain dialogue system for digital human applications that instruction finetunes on a wide range of dialogue tasks in a unified internet-augmented format. Different from other open-domain dialogue models that focus on large-scale pre-training and scaling up model size or dialogue corpus, we aim to build a powerful and practical dialogue system for… ▽ More

    Submitted 15 May, 2023; v1 submitted 16 April, 2023; originally announced April 2023.

    Comments: 36 pages

  19. arXiv:2304.01489  [pdf, other

    cs.CV

    Improved Visual Fine-tuning with Natural Language Supervision

    Authors: Junyang Wang, Yuanhong Xu, Juhua Hu, Ming Yan, Jitao Sang, Qi Qian

    Abstract: Fine-tuning a visual pre-trained model can leverage the semantic information from large-scale pre-training data and mitigate the over-fitting problem on downstream vision tasks with limited training examples. While the problem of catastrophic forgetting in pre-trained backbone has been extensively studied for fine-tuning, its potential bias from the corresponding pre-training task and data, attrac… ▽ More

    Submitted 14 August, 2023; v1 submitted 3 April, 2023; originally announced April 2023.

    Comments: accepted by ICCV'23

  20. arXiv:2304.01290  [pdf, other

    cs.RO

    A Simple Approach for General Task-Oriented Picking using Placing constraints

    Authors: Jen-Wei Wang, Lingfeng Sun, Xinghao Zhu, Qiyang Qian, Masayoshi Tomizuka

    Abstract: Pick-and-place is an important manipulation task in domestic or manufacturing applications. There exist many works focusing on grasp detection with high picking success rate but lacking consideration of downstream manipulation tasks (e.g., placing). Although some research works proposed methods to incorporate task conditions into grasp selection, most of them are data-driven and are therefore hard… ▽ More

    Submitted 3 April, 2023; originally announced April 2023.

  21. arXiv:2302.00402  [pdf, other

    cs.CV cs.CL cs.MM

    mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video

    Authors: Haiyang Xu, Qinghao Ye, Ming Yan, Yaya Shi, Jiabo Ye, Yuanhong Xu, Chenliang Li, Bin Bi, Qi Qian, Wei Wang, Guohai Xu, Ji Zhang, Songfang Huang, Fei Huang, **gren Zhou

    Abstract: Recent years have witnessed a big convergence of language, vision, and multi-modal pretraining. In this work, we present mPLUG-2, a new unified paradigm with modularized design for multi-modal pretraining, which can benefit from modality collaboration while addressing the problem of modality entanglement. In contrast to predominant paradigms of solely relying on sequence-to-sequence generation or… ▽ More

    Submitted 1 February, 2023; originally announced February 2023.

    Journal ref: ICML2023

  22. arXiv:2212.14546  [pdf, other

    cs.CV cs.CL cs.MM

    HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training

    Authors: Qinghao Ye, Guohai Xu, Ming Yan, Haiyang Xu, Qi Qian, Ji Zhang, Fei Huang

    Abstract: Video-language pre-training has advanced the performance of various downstream video-language tasks. However, most previous methods directly inherit or adapt typical image-language pre-training paradigms to video-language pre-training, thus not fully exploiting the unique characteristic of video, i.e., temporal. In this paper, we propose a Hierarchical Temporal-Aware video-language pre-training fr… ▽ More

    Submitted 29 December, 2022; originally announced December 2022.

  23. arXiv:2208.02803  [pdf, other

    cs.LG math.AT

    Semantic Data Augmentation based Distance Metric Learning for Domain Generalization

    Authors: Mengzhu Wang, Jianlong Yuan, Qi Qian, Zhibin Wang, Hao Li

    Abstract: Domain generalization (DG) aims to learn a model on one or more different but related source domains that could be generalized into an unseen target domain. Existing DG methods try to prompt the diversity of source domains for the model's generalization ability, while they may have to introduce auxiliary networks or striking computational costs. On the contrary, this work applies the implicit sema… ▽ More

    Submitted 13 September, 2022; v1 submitted 2 August, 2022; originally announced August 2022.

    Comments: Accept to ACMMM2022

  24. arXiv:2207.07789  [pdf, other

    cs.RO eess.SY

    QuaDUE-CCM: Interpretable Distributional Reinforcement Learning using Uncertain Contraction Metrics for Precise Quadrotor Trajectory Tracking

    Authors: Yanran Wang, James O'Keeffe, Qiuchen Qian, David Boyle

    Abstract: Accuracy and stability are common requirements for Quadrotor trajectory tracking systems. Designing an accurate and stable tracking controller remains challenging, particularly in unknown and dynamic environments with complex aerodynamic disturbances. We propose a Quantile-approximation-based Distributional-reinforced Uncertainty Estimator (QuaDUE) to accurately identify the effects of aerodynamic… ▽ More

    Submitted 15 July, 2022; originally announced July 2022.

    Comments: 18 pages, 9 figures, Quadrotor trajectory tracking, Learning-based control

  25. arXiv:2205.12753  [pdf, other

    cs.CV cs.LG

    An Empirical Study on Distribution Shift Robustness From the Perspective of Pre-Training and Data Augmentation

    Authors: Ziquan Liu, Yi Xu, Yuanhong Xu, Qi Qian, Hao Li, Rong **, Xiangyang Ji, Antoni B. Chan

    Abstract: The performance of machine learning models under distribution shift has been the focus of the community in recent years. Most of current methods have been proposed to improve the robustness to distribution shift from the algorithmic perspective, i.e., designing better training algorithms to help the generalization in shifted test distributions. This paper studies the distribution shift problem fro… ▽ More

    Submitted 25 May, 2022; originally announced May 2022.

  26. arXiv:2205.08924  [pdf, other

    cs.CV

    Financial Time Series Data Augmentation with Generative Adversarial Networks and Extended Intertemporal Return Plots

    Authors: Justin Hellermann, Qinzhuan Qian, Ankit Shah

    Abstract: Data augmentation is a key regularization method to support the forecast and classification performance of highly parameterized models in computer vision. In the time series domain however, regularization in terms of augmentation is not equally common even though these methods have proven to mitigate effects from small sample size or non-stationarity. In this paper we apply state-of-the art image-… ▽ More

    Submitted 19 May, 2022; v1 submitted 18 May, 2022; originally announced May 2022.

  27. arXiv:2205.07150  [pdf, other

    eess.SY cs.AI cs.LG cs.RO

    Interpretable Stochastic Model Predictive Control using Distributional Reinforced Estimation for Quadrotor Tracking Systems

    Authors: Yanran Wang, James O'Keeffe, Qiuchen Qian, David Boyle

    Abstract: This paper presents a novel trajectory tracker for autonomous quadrotor navigation in dynamic and complex environments. The proposed framework integrates a distributional Reinforcement Learning (RL) estimator for unknown aerodynamic effects into a Stochastic Model Predictive Controller (SMPC) for trajectory tracking. Aerodynamic effects derived from drag forces and moment variations are difficult… ▽ More

    Submitted 14 May, 2022; originally announced May 2022.

    Comments: 8 pages, 4 figures

  28. arXiv:2204.02251  [pdf, other

    cs.CV

    RBGNet: Ray-based Grou** for 3D Object Detection

    Authors: Haiyang Wang, Shaoshuai Shi, Ze Yang, Rongyao Fang, Qi Qian, Hongsheng Li, Bernt Schiele, Liwei Wang

    Abstract: As a fundamental problem in computer vision, 3D object detection is experiencing rapid growth. To extract the point-wise features from the irregularly and sparsely distributed points, previous methods usually take a feature grou** module to aggregate the point features to an object candidate. However, these methods have not yet leveraged the surface geometry of foreground objects to enhance grou… ▽ More

    Submitted 5 April, 2022; originally announced April 2022.

  29. arXiv:2203.04595  [pdf, other

    cs.RO

    Practical Mission Planning for Optimized UAV-Sensor Wireless Recharging

    Authors: Qiuchen Qian, James O'Keeffe, Yanran Wang, David Boyle

    Abstract: Optimal maintenance of sensor nodes in a Wireless Rechargeable Sensor Network (WRSN) requires effective scheduling of power delivery vehicles by solving the Charging Scheduling Problem (CSP). Deploying Unmanned Aerial Vehicles (UAVs) as mobile chargers has emerged as a promising solution due to their mobility and flexibility. The CSP can be formulated as a Mixed-Integer Non-Linear Programming prob… ▽ More

    Submitted 14 April, 2023; v1 submitted 9 March, 2022; originally announced March 2022.

    Comments: 15 pages, 13 figures

  30. arXiv:2202.12419  [pdf, other

    cs.RO

    KinoJGM: A framework for efficient and accurate quadrotor trajectory generation and tracking in dynamic environments

    Authors: Yanran Wang, James O'Keeffe, Qiuchen Qian, David Boyle

    Abstract: Unmapped areas and aerodynamic disturbances render autonomous navigation with quadrotors extremely challenging. To fly safely and efficiently, trajectory planners and trackers must be able to navigate unknown environments with unpredictable aerodynamic effects in real-time. When encountering aerodynamic effects such as strong winds, most current approaches to quadrotor trajectory planning and trac… ▽ More

    Submitted 11 March, 2022; v1 submitted 24 February, 2022; originally announced February 2022.

    Comments: 7pages, 8 figures, IEEE International Conference on Robotics and Automation 2022, accepted

  31. arXiv:2202.11484  [pdf, other

    cs.CV

    Reconstruction Task Finds Universal Winning Tickets

    Authors: Ruichen Li, Binghui Li, Qi Qian, Liwei Wang

    Abstract: Pruning well-trained neural networks is effective to achieve a promising accuracy-efficiency trade-off in computer vision regimes. However, most of existing pruning algorithms only focus on the classification task defined on the source domain. Different from the strong transferability of the original model, a pruned network is hard to transfer to complicated downstream tasks such as object detecti… ▽ More

    Submitted 23 February, 2022; originally announced February 2022.

    Comments: Under review

  32. arXiv:2111.12292  [pdf, other

    cs.CV cs.LG stat.ML

    Improved Fine-Tuning by Better Leveraging Pre-Training Data

    Authors: Ziquan Liu, Yi Xu, Yuanhong Xu, Qi Qian, Hao Li, Xiangyang Ji, Antoni Chan, Rong **

    Abstract: As a dominant paradigm, fine-tuning a pre-trained model on the target data is widely used in many deep learning applications, especially for small data sets. However, recent studies have empirically shown that training from scratch has the final performance that is no worse than this pre-training strategy once the number of training samples is increased in some vision tasks. In this work, we revis… ▽ More

    Submitted 25 May, 2022; v1 submitted 24 November, 2021; originally announced November 2021.

  33. arXiv:2109.00650  [pdf, other

    cs.LG cs.CV stat.ML

    Dash: Semi-Supervised Learning with Dynamic Thresholding

    Authors: Yi Xu, Lei Shang, **xing Ye, Qi Qian, Yu-Feng Li, Baigui Sun, Hao Li, Rong **

    Abstract: While semi-supervised learning (SSL) has received tremendous attentions in many machine learning tasks due to its successful use of unlabeled data, existing SSL algorithms use either all unlabeled examples or the unlabeled examples with a fixed high-confidence prediction during the training progress. However, it is possible that too many correct/wrong pseudo labeled examples are eliminated/selecte… ▽ More

    Submitted 1 September, 2021; originally announced September 2021.

    Comments: ICML 2021

  34. arXiv:2105.11527  [pdf, other

    cs.CV

    Unsupervised Visual Representation Learning by Online Constrained K-Means

    Authors: Qi Qian, Yuanhong Xu, Juhua Hu, Hao Li, Rong **

    Abstract: Cluster discrimination is an effective pretext task for unsupervised representation learning, which often consists of two phases: clustering and discrimination. Clustering is to assign each instance a pseudo label that will be used to learn representations in discrimination. The main challenge resides in clustering since prevalent clustering methods (e.g., k-means) have to run in a batch mode. Bes… ▽ More

    Submitted 28 March, 2022; v1 submitted 24 May, 2021; originally announced May 2021.

    Comments: accepted by CVPR'22

  35. arXiv:2105.06015  [pdf, ps, other

    cs.LG

    Why Does Multi-Epoch Training Help?

    Authors: Yi Xu, Qi Qian, Hao Li, Rong **

    Abstract: Stochastic gradient descent (SGD) has become the most attractive optimization method in training large-scale deep neural networks due to its simplicity, low computational cost in each updating step, and good performance. Standard excess risk bounds show that SGD only needs to take one pass over the training data and more passes could not help to improve the performance. Empirically, it has been ob… ▽ More

    Submitted 12 May, 2021; originally announced May 2021.

  36. arXiv:2104.04114  [pdf, ps, other

    cs.LG

    A Theoretical Analysis of Learning with Noisily Labeled Data

    Authors: Yi Xu, Qi Qian, Hao Li, Rong **

    Abstract: Noisy labels are very common in deep supervised learning. Although many studies tend to improve the robustness of deep training for noisy labels, rare works focus on theoretically explaining the training behaviors of learning with noisily labeled data, which is a fundamental principle in understanding its generalization. In this draft, we study its two phenomena, clean data first and phase transit… ▽ More

    Submitted 8 April, 2021; originally announced April 2021.

  37. arXiv:2103.11402  [pdf, other

    cs.CV cs.AI

    Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework

    Authors: Qiang Zhou, Chaohui Yu, Zhibin Wang, Qi Qian, Hao Li

    Abstract: Supervised learning based object detection frameworks demand plenty of laborious manual annotations, which may not be practical in real applications. Semi-supervised object detection (SSOD) can effectively leverage unlabeled data to improve the model performance, which is of great significance for the application of object detection models. In this paper, we revisit SSOD and propose Instant-Teachi… ▽ More

    Submitted 21 March, 2021; originally announced March 2021.

  38. arXiv:2102.01063  [pdf, other

    cs.CV cs.LG

    Zen-NAS: A Zero-Shot NAS for High-Performance Deep Image Recognition

    Authors: Ming Lin, Pichao Wang, Zhenhong Sun, Hesen Chen, Xiuyu Sun, Qi Qian, Hao Li, Rong **

    Abstract: Accuracy predictor is a key component in Neural Architecture Search (NAS) for ranking architectures. Building a high-quality accuracy predictor usually costs enormous computation. To address this issue, instead of using an accuracy predictor, we propose a novel zero-shot index dubbed Zen-Score to rank the architectures. The Zen-Score represents the network expressivity and positively correlates wi… ▽ More

    Submitted 22 August, 2021; v1 submitted 1 February, 2021; originally announced February 2021.

    Comments: accepted by ICCV 2021

    MSC Class: 68T07; 65D19 ACM Class: I.2.10; J.6; I.4.0; I.5.2

  39. arXiv:2010.01267  [pdf, ps, other

    cs.LG cs.CV math.OC stat.ML

    WeMix: How to Better Utilize Data Augmentation

    Authors: Yi Xu, Asaf Noy, Ming Lin, Qi Qian, Hao Li, Rong **

    Abstract: Data augmentation is a widely used training trick in deep learning to improve the network generalization ability. Despite many encouraging results, several recent studies did point out limitations of the conventional data augmentation scheme in certain scenarios, calling for a better theoretical understanding of data augmentation. In this work, we develop a comprehensive analysis that reveals pros… ▽ More

    Submitted 2 October, 2020; originally announced October 2020.

  40. arXiv:2009.14416  [pdf, other

    cs.LG cs.CV stat.ML

    Improved Knowledge Distillation via Full Kernel Matrix Transfer

    Authors: Qi Qian, Hao Li, Juhua Hu

    Abstract: Knowledge distillation is an effective way for model compression in deep learning. Given a large model (i.e., teacher model), it aims to improve the performance of a compact model (i.e., student model) by transferring the information from the teacher. Various information for distillation has been studied. Recently, a number of works propose to transfer the pairwise similarity between examples to d… ▽ More

    Submitted 29 March, 2022; v1 submitted 30 September, 2020; originally announced September 2020.

    Comments: accepted by SDM'22

  41. arXiv:2009.04989  [pdf, other

    cs.CV

    Semi-Anchored Detector for One-Stage Object Detection

    Authors: Lei Chen, Qi Qian, Hao Li

    Abstract: A standard one-stage detector is comprised of two tasks: classification and regression. Anchors of different shapes are introduced for each location in the feature map to mitigate the challenge of regression for multi-scale objects. However, the performance of classification can degrade due to the highly class-imbalanced problem in anchors. Recently, many anchor-free algorithms have been proposed… ▽ More

    Submitted 10 September, 2020; originally announced September 2020.

  42. arXiv:2006.14090  [pdf, other

    cs.CV

    Neural Architecture Design for GPU-Efficient Networks

    Authors: Ming Lin, Hesen Chen, Xiuyu Sun, Qi Qian, Hao Li, Rong **

    Abstract: Many mission-critical systems are based on GPU for inference. It requires not only high recognition accuracy but also low latency in responding time. Although many studies are devoted to optimizing the structure of deep models for efficient inference, most of them do not leverage the architecture of \textbf{modern GPU} for fast inference, leading to suboptimal performance. To address this issue, w… ▽ More

    Submitted 11 August, 2020; v1 submitted 24 June, 2020; originally announced June 2020.

    Comments: update training setting

  43. arXiv:2006.11653  [pdf, other

    cs.LG cs.CV stat.ML

    Towards Understanding Label Smoothing

    Authors: Yi Xu, Yuanhong Xu, Qi Qian, Hao Li, Rong **

    Abstract: Label smoothing regularization (LSR) has a great success in training deep neural networks by stochastic algorithms such as stochastic gradient descent and its variants. However, the theoretical understanding of its power from the view of optimization is still rare. This study opens the door to a deep understanding of LSR by initiating the analysis. In this paper, we analyze the convergence behavio… ▽ More

    Submitted 2 October, 2020; v1 submitted 20 June, 2020; originally announced June 2020.

  44. arXiv:2005.09681  [pdf, other

    cs.CV

    Weakly Supervised Representation Learning with Coarse Labels

    Authors: Yuanhong Xu, Qi Qian, Hao Li, Rong **, Juhua Hu

    Abstract: With the development of computational power and techniques for data collection, deep learning demonstrates a superior performance over most existing algorithms on visual benchmark data sets. Many efforts have been devoted to studying the mechanism of deep learning. One important observation is that deep learning can learn the discriminative patterns from raw materials directly in a task-dependent… ▽ More

    Submitted 24 August, 2021; v1 submitted 19 May, 2020; originally announced May 2020.

    Comments: accepted by ICCV'21

  45. arXiv:1911.04047  [pdf, other

    cs.CV cs.LG

    Hierarchically Robust Representation Learning

    Authors: Qi Qian, Juhua Hu, Hao Li

    Abstract: With the tremendous success of deep learning in visual tasks, the representations extracted from intermediate layers of learned models, that is, deep features, attract much attention of researchers. Previous empirical analysis shows that those features can contain appropriate semantic information. Therefore, with a model trained on a large-scale benchmark data set (e.g., ImageNet), the extracted f… ▽ More

    Submitted 27 March, 2020; v1 submitted 10 November, 2019; originally announced November 2019.

    Comments: accepted by CVPR'20

  46. arXiv:1909.05235  [pdf, other

    cs.CV

    SoftTriple Loss: Deep Metric Learning Without Triplet Sampling

    Authors: Qi Qian, Lei Shang, Baigui Sun, Juhua Hu, Hao Li, Rong **

    Abstract: Distance metric learning (DML) is to learn the embeddings where examples from the same class are closer than examples from different classes. It can be cast as an optimization problem with triplet constraints. Due to the vast number of triplet constraints, a sampling strategy is essential for DML. With the tremendous success of deep learning in classifications, it has been applied for DML. When le… ▽ More

    Submitted 14 April, 2020; v1 submitted 11 September, 2019; originally announced September 2019.

    Comments: accepted by ICCV'19

  47. arXiv:1907.10156  [pdf, other

    cs.CV

    DR Loss: Improving Object Detection by Distributional Ranking

    Authors: Qi Qian, Lei Chen, Hao Li, Rong **

    Abstract: Most of object detection algorithms can be categorized into two classes: two-stage detectors and one-stage detectors. Recently, many efforts have been devoted to one-stage detectors for the simple yet effective architecture. Different from two-stage detectors, one-stage detectors aim to identify foreground objects from all candidates in a single stage. This architecture is efficient but can suffer… ▽ More

    Submitted 13 April, 2020; v1 submitted 23 July, 2019; originally announced July 2019.

    Comments: accepted by CVPR'20

  48. arXiv:1906.03559  [pdf, other

    stat.ML cs.LG

    The Implicit Bias of AdaGrad on Separable Data

    Authors: Qian Qian, Xiaoyuan Qian

    Abstract: We study the implicit bias of AdaGrad on separable linear classification problems. We show that AdaGrad converges to a direction that can be characterized as the solution of a quadratic optimization problem with the same feasible set as the hard SVM problem. We also give a discussion about how different choices of the hyperparameters of AdaGrad might impact this direction. This provides a deeper u… ▽ More

    Submitted 9 June, 2019; originally announced June 2019.

  49. arXiv:1906.01095  [pdf, other

    stat.ML cs.LG eess.SP

    Robust Gaussian Process Regression for Real-Time High Precision GPS Signal Enhancement

    Authors: Ming Lin, Xiaomin Song, Qi Qian, Hao Li, Liang Sun, Shenghuo Zhu, Rong **

    Abstract: Satellite-based positioning system such as GPS often suffers from large amount of noise that degrades the positioning accuracy dramatically especially in real-time applications. In this work, we consider a data-mining approach to enhance the GPS signal. We build a large-scale high precision GPS receiver grid system to collect real-time GPS signals for training. The Gaussian Process (GP) regression… ▽ More

    Submitted 3 June, 2019; originally announced June 2019.

    Comments: accepted by SIGKDD 2019

  50. arXiv:1901.11149  [pdf, other

    stat.ML cs.LG

    Which Factorization Machine Modeling is Better: A Theoretical Answer with Optimal Guarantee

    Authors: Ming Lin, Shuang Qiu, Jie** Ye, Xiaomin Song, Qi Qian, Liang Sun, Shenghuo Zhu, Rong **

    Abstract: Factorization machine (FM) is a popular machine learning model to capture the second order feature interactions. The optimal learning guarantee of FM and its generalized version is not yet developed. For a rank $k$ generalized FM of $d$ dimensional input, the previous best known sampling complexity is $\mathcal{O}[k^{3}d\cdot\mathrm{polylog}(kd)]$ under Gaussian distribution. This bound is sub-opt… ▽ More

    Submitted 30 January, 2019; originally announced January 2019.

    Comments: accepted by AAAI 2019