Skip to main content

Showing 1–50 of 117 results for author: Ding, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18129  [pdf, other

    cs.CV cs.LG

    CTS: Sim-to-Real Unsupervised Domain Adaptation on 3D Detection

    Authors: Meiying Zhang, Weiyuan Peng, Guangyao Ding, Chenyang Lei, Chunlin Ji, Qi Hao

    Abstract: Simulation data can be accurately labeled and have been expected to improve the performance of data-driven algorithms, including object detection. However, due to the various domain inconsistencies from simulation to reality (sim-to-real), cross-domain object detection algorithms usually suffer from dramatic performance drops. While numerous unsupervised domain adaptation (UDA) methods have been d… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  2. arXiv:2405.14458  [pdf, other

    cs.CV

    YOLOv10: Real-Time End-to-End Object Detection

    Authors: Ao Wang, Hui Chen, Lihao Liu, Kai Chen, Zijia Lin, Jungong Han, Guiguang Ding

    Abstract: Over the past years, YOLOs have emerged as the predominant paradigm in the field of real-time object detection owing to their effective balance between computational cost and detection performance. Researchers have explored the architectural designs, optimization objectives, data augmentation strategies, and others for YOLOs, achieving notable progress. However, the reliance on the non-maximum sup… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: Code: https://github.com/THU-MIG/yolov10

  3. arXiv:2405.13870  [pdf, other

    cs.CV

    FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition

    Authors: Ganggui Ding, Canyu Zhao, Wen Wang, Zhen Yang, Zide Liu, Hao Chen, Chunhua Shen

    Abstract: Benefiting from large-scale pre-trained text-to-image (T2I) generative models, impressive progress has been achieved in customized image generation, which aims to generate user-specified concepts. Existing approaches have extensively focused on single-concept customization and still encounter challenges when it comes to complex scenarios that involve combining multiple concepts. These approaches o… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: CVPR2024

  4. arXiv:2405.05164  [pdf, other

    cs.CV

    ProbRadarM3F: mmWave Radar based Human Skeletal Pose Estimation with Probability Map Guided Multi-Format Feature Fusion

    Authors: Bing Zhu, Zixin He, Weiyi Xiong, Guanhua Ding, Jianan Liu, Tao Huang, Wei Chen, Wei Xiang

    Abstract: Millimeter wave (mmWave) radar is a non-intrusive privacy and relatively convenient and inexpensive device, which has been demonstrated to be applicable in place of RGB cameras in human indoor pose estimation tasks. However, mmWave radar relies on the collection of reflected signals from the target, and the radar signals containing information is difficult to be fully applied. This has been a long… ▽ More

    Submitted 28 June, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

  5. arXiv:2405.00749  [pdf, other

    cs.CV cs.LG

    More is Better: Deep Domain Adaptation with Multiple Sources

    Authors: Sicheng Zhao, Hui Chen, Hu Huang, Pengfei Xu, Guiguang Ding

    Abstract: In many practical applications, it is often difficult and expensive to obtain large-scale labeled data to train state-of-the-art deep neural networks. Therefore, transferring the learned knowledge from a separate, labeled source domain to an unlabeled or sparsely labeled target domain becomes an appealing alternative. However, direct transfer often results in significant performance decay due to d… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

    Comments: Accepted by IJCAI 2024. arXiv admin note: text overlap with arXiv:2002.12169

  6. arXiv:2404.17808  [pdf, other

    cs.CL

    Scaffold-BPE: Enhancing Byte Pair Encoding with Simple and Effective Scaffold Token Removal

    Authors: Haoran Lian, Yizhe Xiong, Jianwei Niu, Shasha Mo, Zhenpeng Su, Zijia Lin, Peng Liu, Hui Chen, Guiguang Ding

    Abstract: Byte Pair Encoding (BPE) serves as a foundation method for text tokenization in the Natural Language Processing (NLP) field. Despite its wide adoption, the original BPE algorithm harbors an inherent flaw: it inadvertently introduces a frequency imbalance for tokens in the text corpus. Since BPE iteratively merges the most frequent token pair in the text corpus while kee** all tokens that have be… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

  7. arXiv:2404.17785  [pdf, other

    cs.CL

    Temporal Scaling Law for Large Language Models

    Authors: Yizhe Xiong, Xiansheng Chen, Xin Ye, Hui Chen, Zijia Lin, Haoran Lian, Zhenpeng Su, Jianwei Niu, Guiguang Ding

    Abstract: Recently, Large Language Models (LLMs) have been widely adopted in a wide range of tasks, leading to increasing attention towards the research on how scaling LLMs affects their performance. Existing works, termed Scaling Laws, have discovered that the final test loss of LLMs scales as power-laws with model size, computational budget, and dataset size. However, the temporal change of the test loss… ▽ More

    Submitted 16 June, 2024; v1 submitted 27 April, 2024; originally announced April 2024.

    Comments: 8 pages, 3 figures; Under review

  8. arXiv:2404.10292  [pdf, other

    cs.CV cs.MM

    From Data Deluge to Data Curation: A Filtering-WoRA Paradigm for Efficient Text-based Person Search

    Authors: **tao Sun, Zhedong Zheng, Gangyi Ding

    Abstract: In text-based person search endeavors, data generation has emerged as a prevailing practice, addressing concerns over privacy preservation and the arduous task of manual annotation. Although the number of synthesized data can be infinite in theory, the scientific conundrum persists that how much generated data optimally fuels subsequent model training. We observe that only a subset of the data in… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  9. arXiv:2403.19969  [pdf, other

    cs.CV cs.LG

    Separate, Dynamic and Differentiable (SMART) Pruner for Block/Output Channel Pruning on Computer Vision Tasks

    Authors: Guanhua Ding, Zexi Ye, Zhen Zhong, Gang Li, David Shao

    Abstract: Deep Neural Network (DNN) pruning has emerged as a key strategy to reduce model size, improve inference latency, and lower power consumption on DNN accelerators. Among various pruning techniques, block and output channel pruning have shown significant potential in accelerating hardware performance. However, their accuracy often requires further improvement. In response to this challenge, we introd… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

  10. arXiv:2403.09192  [pdf, other

    cs.CV

    PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient Task Adaptation

    Authors: Yizhe Xiong, Hui Chen, Tianxiang Hao, Zijia Lin, Jungong Han, Yuesong Zhang, Guoxin Wang, Yongjun Bao, Guiguang Ding

    Abstract: Recently, the scale of transformers has grown rapidly, which introduces considerable challenges in terms of training overhead and inference efficiency in the scope of task adaptation. Existing works, namely Parameter-Efficient Fine-Tuning (PEFT) and model compression, have separately investigated the challenges. However, PEFT cannot guarantee the inference efficiency of the original backbone, espe… ▽ More

    Submitted 1 July, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

    Comments: 15 pages, 5 figures, Accepted by ECCV 2024

  11. arXiv:2403.06423  [pdf, other

    eess.SP cs.RO

    LiDAR Point Cloud-based Multiple Vehicle Tracking with Probabilistic Measurement-Region Association

    Authors: Guanhua Ding, Jianan Liu, Yuxuan Xia, Tao Huang, Bing Zhu, **** Sun

    Abstract: Multiple extended target tracking (ETT) has gained increasing attention due to the development of high-precision LiDAR and radar sensors in automotive applications. For LiDAR point cloud-based vehicle tracking, this paper presents a probabilistic measurement-region association (PMRA) ETT model, which can describe the complex measurement distribution by partitioning the target extent into different… ▽ More

    Submitted 18 May, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: 8 pages, 5 figures, accepted by the 27th International Conference on Information Fusion (FUSION 2024)

  12. arXiv:2403.06102  [pdf, other

    cs.CV

    Coherent Temporal Synthesis for Incremental Action Segmentation

    Authors: Guodong Ding, Hans Golong, Angela Yao

    Abstract: Data replay is a successful incremental learning technique for images. It prevents catastrophic forgetting by kee** a reservoir of previous data, original or synthesized, to ensure the model retains past knowledge while adapting to novel concepts. However, its application in the video domain is rudimentary, as it simply stores frame exemplars for action recognition. This paper presents the first… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

    Comments: 10 pages, 6 figures, 5 tables, accepted to CVPR 2024

  13. arXiv:2402.18821  [pdf, other

    cs.CV

    Debiased Novel Category Discovering and Localization

    Authors: Juexiao Feng, Yuhong Yang, Yanchun Xie, Yaqian Li, Yandong Guo, Yuchen Guo, Yuwei He, Liuyu Xiang, Guiguang Ding

    Abstract: In recent years, object detection in deep learning has experienced rapid development. However, most existing object detection models perform well only on closed-set datasets, ignoring a large number of potential objects whose categories are not defined in the training set. These objects are often identified as background or incorrectly classified as pre-defined categories by the detectors. In this… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: Accepted by AAAI 2024

  14. arXiv:2312.16145  [pdf, other

    cs.CV cs.AI cs.LG

    One-Dimensional Adapter to Rule Them All: Concepts, Diffusion Models and Erasing Applications

    Authors: Mengyao Lyu, Yuhong Yang, Haiwen Hong, Hui Chen, Xuan **, Yuan He, Hui Xue, Jungong Han, Guiguang Ding

    Abstract: The prevalent use of commercial and open-source diffusion models (DMs) for text-to-image generation prompts risk mitigation to prevent undesired behaviors. Existing concept erasing methods in academia are all based on full parameter or specification-based fine-tuning, from which we observe the following issues: 1) Generation alternation towards erosion: Parameter drift during target elimination ca… ▽ More

    Submitted 11 March, 2024; v1 submitted 26 December, 2023; originally announced December 2023.

    Comments: CVPR 2024

  15. arXiv:2312.13910  [pdf, other

    cs.RO cs.LG cs.MA

    Multi-Agent Probabilistic Ensembles with Trajectory Sampling for Connected Autonomous Vehicles

    Authors: Ruoqi Wen, Jiahao Huang, Rongpeng Li, Guoru Ding, Zhifeng Zhao

    Abstract: Autonomous Vehicles (AVs) have attracted significant attention in recent years and Reinforcement Learning (RL) has shown remarkable performance in improving the autonomy of vehicles. In that regard, the widely adopted Model-Free RL (MFRL) promises to solve decision-making tasks in connected AVs (CAVs), contingent on the readiness of a significant amount of data samples for training. Nevertheless,… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

  16. arXiv:2312.10813  [pdf, other

    cs.CV cs.CL cs.LG

    Re-parameterized Low-rank Prompt: Generalize a Vision-Language Model within 0.5K Parameters

    Authors: Tianxiang Hao, Mengyao Lyu, Hui Chen, Sicheng Zhao, Jungong Han, Guiguang Ding

    Abstract: With the development of large pre-trained vision-language models, how to effectively transfer the knowledge of such foundational models to downstream tasks becomes a hot topic, especially in a data-deficient scenario. Recently, prompt tuning has become a popular solution. When adapting the vision-language models, researchers freeze the parameters in the backbone and only design and tune the prompt… ▽ More

    Submitted 11 January, 2024; v1 submitted 17 December, 2023; originally announced December 2023.

  17. arXiv:2312.05760  [pdf, other

    cs.CV

    RepViT-SAM: Towards Real-Time Segmenting Anything

    Authors: Ao Wang, Hui Chen, Zijia Lin, Jungong Han, Guiguang Ding

    Abstract: Segment Anything Model (SAM) has shown impressive zero-shot transfer performance for various computer vision tasks recently. However, its heavy computation costs remain daunting for practical applications. MobileSAM proposes to replace the heavyweight image encoder in SAM with TinyViT by employing distillation, which results in a significant reduction in computational requirements. However, its de… ▽ More

    Submitted 29 February, 2024; v1 submitted 9 December, 2023; originally announced December 2023.

    Comments: Technical report of RepViT+SAM in our CVPR 2024 work. Project page: https://jameslahm.github.io/repvit-sam/

  18. arXiv:2310.19531  [pdf, other

    cs.CL

    MiLe Loss: a New Loss for Mitigating the Bias of Learning Difficulties in Generative Language Models

    Authors: Zhenpeng Su, Xing Wu, Xue Bai, Zijia Lin, Hui Chen, Guiguang Ding, Wei Zhou, Songlin Hu

    Abstract: Generative language models are usually pretrained on large text corpus via predicting the next token (i.e., sub-word/word/phrase) given the previous ones. Recent works have demonstrated the impressive performance of large generative language models on downstream tasks. However, existing generative language models generally neglect an inherent challenge in text corpus during training, i.e., the imb… ▽ More

    Submitted 28 March, 2024; v1 submitted 30 October, 2023; originally announced October 2023.

    Comments: This paper has been accepted by NAACL 2024

  19. arXiv:2310.12149  [pdf, other

    cs.CV

    Object-aware Inversion and Reassembly for Image Editing

    Authors: Zhen Yang, Ganggui Ding, Wen Wang, Hao Chen, Bohan Zhuang, Chunhua Shen

    Abstract: By comparing the original and target prompts, we can obtain numerous editing pairs, each comprising an object and its corresponding editing target. To allow editability while maintaining fidelity to the input image, existing editing methods typically involve a fixed number of inversion steps that project the whole input image to its noisier latent representation, followed by a denoising process gu… ▽ More

    Submitted 18 March, 2024; v1 submitted 18 October, 2023; originally announced October 2023.

    Comments: Project Page: https://aim-uofa.github.io/OIR-Diffusion/

  20. arXiv:2309.15755  [pdf, other

    cs.CV

    CAIT: Triple-Win Compression towards High Accuracy, Fast Inference, and Favorable Transferability For ViTs

    Authors: Ao Wang, Hui Chen, Zijia Lin, Sicheng Zhao, Jungong Han, Guiguang Ding

    Abstract: Vision Transformers (ViTs) have emerged as state-of-the-art models for various vision tasks recently. However, their heavy computation costs remain daunting for resource-limited devices. Consequently, researchers have dedicated themselves to compressing redundant information in ViTs for acceleration. However, they generally sparsely drop redundant image tokens by token pruning or brutally remove c… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  21. arXiv:2309.15575  [pdf, other

    cs.CV

    Confidence-based Visual Dispersal for Few-shot Unsupervised Domain Adaptation

    Authors: Yizhe Xiong, Hui Chen, Zijia Lin, Sicheng Zhao, Guiguang Ding

    Abstract: Unsupervised domain adaptation aims to transfer knowledge from a fully-labeled source domain to an unlabeled target domain. However, in real-world scenarios, providing abundant labeled data even in the source domain can be infeasible due to the difficulty and high expense of annotation. To address this issue, recent works consider the Few-shot Unsupervised Domain Adaptation (FUDA) where only a few… ▽ More

    Submitted 29 September, 2023; v1 submitted 27 September, 2023; originally announced September 2023.

    Comments: Accepted as ICCV 2023 poster (https://openaccess.thecvf.com/content/ICCV2023/html/Xiong_Confidence-based_Visual_Dispersal_for_Few-shot_Unsupervised_Domain_Adaptation_ICCV_2023_paper.html)

  22. arXiv:2308.14332  [pdf, other

    cs.CV cs.RO

    Attention-Guided Lidar Segmentation and Odometry Using Image-to-Point Cloud Saliency Transfer

    Authors: Guanqun Ding, Nevrez Imamoglu, Ali Caglayan, Masahiro Murakawa, Ryosuke Nakamura

    Abstract: LiDAR odometry estimation and 3D semantic segmentation are crucial for autonomous driving, which has achieved remarkable advances recently. However, these tasks are challenging due to the imbalance of points in different semantic categories for 3D semantic segmentation and the influence of dynamic objects for LiDAR odometry estimation, which increases the importance of using representative/salient… ▽ More

    Submitted 16 June, 2024; v1 submitted 28 August, 2023; originally announced August 2023.

    Comments: 33 pages, 12 Figures, 6 Tables, accepted to appear in Multimedia Systems journal (2024)

  23. arXiv:2307.16453  [pdf, other

    cs.AI cs.LO

    Every Mistake Counts in Assembly

    Authors: Guodong Ding, Fadime Sener, Shugao Ma, Angela Yao

    Abstract: One promising use case of AI assistants is to help with complex procedures like cooking, home repair, and assembly tasks. Can we teach the assistant to interject after the user makes a mistake? This paper targets the problem of identifying ordering mistakes in assembly procedures. We propose a system that can detect ordering mistakes by utilizing a learned knowledge base. Our framework constructs… ▽ More

    Submitted 31 July, 2023; originally announced July 2023.

    Comments: 10 pages, 5 figures

  24. arXiv:2307.09283  [pdf, other

    cs.CV

    RepViT: Revisiting Mobile CNN From ViT Perspective

    Authors: Ao Wang, Hui Chen, Zijia Lin, Jungong Han, Guiguang Ding

    Abstract: Recently, lightweight Vision Transformers (ViTs) demonstrate superior performance and lower latency, compared with lightweight Convolutional Neural Networks (CNNs), on resource-constrained mobile devices. Researchers have discovered many structural connections between lightweight ViTs and lightweight CNNs. However, the notable architectural disparities in the block structure, macro, and micro desi… ▽ More

    Submitted 14 March, 2024; v1 submitted 18 July, 2023; originally announced July 2023.

    Comments: CVPR 2024 Camera-ready Version

  25. arXiv:2305.00603  [pdf, other

    cs.CV cs.AI cs.LG

    Consolidator: Mergeable Adapter with Grouped Connections for Visual Adaptation

    Authors: Tianxiang Hao, Hui Chen, Yuchen Guo, Guiguang Ding

    Abstract: Recently, transformers have shown strong ability as visual feature extractors, surpassing traditional convolution-based models in various scenarios. However, the success of vision transformers largely owes to their capacity to accommodate numerous parameters. As a result, new challenges for adapting large models to downstream tasks arise. On the one hand, classic fine-tuning tunes all parameters i… ▽ More

    Submitted 30 April, 2023; originally announced May 2023.

    Comments: ICLR 2023

  26. arXiv:2304.07728  [pdf, other

    cs.RO cs.AI cs.CV

    TransFusionOdom: Interpretable Transformer-based LiDAR-Inertial Fusion Odometry Estimation

    Authors: Leyuan Sun, Guanqun Ding, Yue Qiu, Yusuke Yoshiyasu, Fumio Kanehiro

    Abstract: Multi-modal fusion of sensors is a commonly used approach to enhance the performance of odometry estimation, which is also a fundamental module for mobile robots. However, the question of \textit{how to perform fusion among different modalities in a supervised sensor fusion odometry estimation task?} is still one of challenging issues remains. Some simple operations, such as element-wise summation… ▽ More

    Submitted 25 April, 2023; v1 submitted 16 April, 2023; originally announced April 2023.

    Comments: Submitted to IEEE Sensors Journal with some modifications. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  27. arXiv:2303.13089  [pdf, other

    cs.CV cs.LG

    Box-Level Active Detection

    Authors: Mengyao Lyu, Jundong Zhou, Hui Chen, Yijie Huang, Dongdong Yu, Yaqian Li, Yandong Guo, Yuchen Guo, Liuyu Xiang, Guiguang Ding

    Abstract: Active learning selects informative samples for annotation within budget, which has proven efficient recently on object detection. However, the widely used active detection benchmarks conduct image-level evaluation, which is unrealistic in human workload estimation and biased towards crowded images. Furthermore, existing methods still perform image-level annotation, but equally scoring all targets… ▽ More

    Submitted 23 March, 2023; originally announced March 2023.

    Comments: CVPR 2023 highlight

  28. arXiv:2302.02075  [pdf, other

    cs.CV

    X-ReID: Cross-Instance Transformer for Identity-Level Person Re-Identification

    Authors: Leqi Shen, Tao He, Yuchen Guo, Guiguang Ding

    Abstract: Currently, most existing person re-identification methods use Instance-Level features, which are extracted only from a single image. However, these Instance-Level features can easily ignore the discriminative information due to the appearance of each identity varies greatly in different images. Thus, it is necessary to exploit Identity-Level features, which can be shared across different images of… ▽ More

    Submitted 3 February, 2023; originally announced February 2023.

    Comments: 9 pages

  29. arXiv:2302.00179  [pdf, other

    cs.CV

    Stable Attribute Group Editing for Reliable Few-shot Image Generation

    Authors: Guanqi Ding, Xinzhe Han, Shuhui Wang, Xin **, Dandan Tu, Qingming Huang

    Abstract: Few-shot image generation aims to generate data of an unseen category based on only a few samples. Apart from basic content generation, a bunch of downstream applications hopefully benefit from this task, such as low-data detection and few-shot classification. To achieve this goal, the generated images should guarantee category retention for classification beyond the visual quality and diversity.… ▽ More

    Submitted 31 January, 2023; originally announced February 2023.

  30. arXiv:2211.11921  [pdf, other

    cs.CV

    Confidence-guided Centroids for Unsupervised Person Re-Identification

    Authors: Yunqi Miao, Jiankang Deng, Guiguang Ding, Jungong Han

    Abstract: Unsupervised person re-identification (ReID) aims to train a feature extractor for identity retrieval without exploiting identity labels. Due to the blind trust in imperfect clustering results, the learning is inevitably misled by unreliable pseudo labels. Albeit the pseudo label refinement has been investigated by previous works, they generally leverage auxiliary information such as camera IDs an… ▽ More

    Submitted 21 November, 2022; originally announced November 2022.

  31. arXiv:2211.05910  [pdf, other

    eess.IV cs.CV

    Efficient and Accurate Quantized Image Super-Resolution on Mobile NPUs, Mobile AI & AIM 2022 challenge: Report

    Authors: Andrey Ignatov, Radu Timofte, Maurizio Denna, Abdel Younes, Ganzorig Gankhuyag, **gang Huh, Myeong Kyun Kim, Kihwan Yoon, Hyeon-Cheol Moon, Seungho Lee, Yoonsik Choe, **woo Jeong, Sungjei Kim, Maciej Smyl, Tomasz Latkowski, Pawel Kubik, Michal Sokolski, Yujie Ma, Jiahao Chao, Zhou Zhou, Hongfan Gao, Zhengfeng Yang, Zhenbing Zeng, Zhengyang Zhuge, Chenghua Li , et al. (71 additional authors not shown)

    Abstract: Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

    Comments: arXiv admin note: text overlap with arXiv:2105.07825, arXiv:2105.08826, arXiv:2211.04470, arXiv:2211.03885, arXiv:2211.05256

  32. arXiv:2211.01556  [pdf, other

    cs.CV

    Ground Plane Matters: Picking Up Ground Plane Prior in Monocular 3D Object Detection

    Authors: Fan Yang, Xinhao Xu, Hui Chen, Yuchen Guo, Jungong Han, Kai Ni, Guiguang Ding

    Abstract: The ground plane prior is a very informative geometry clue in monocular 3D object detection (M3OD). However, it has been neglected by most mainstream methods. In this paper, we identify two key factors that limit the applicability of ground plane prior: the projection point localization issue and the ground plane tilt issue. To pick up the ground plane prior for M3OD, we propose a Ground Plane Enh… ▽ More

    Submitted 2 November, 2022; originally announced November 2022.

    Comments: 13 pages, 10 figures

  33. arXiv:2210.10352  [pdf, other

    cs.CV

    Temporal Action Segmentation: An Analysis of Modern Techniques

    Authors: Guodong Ding, Fadime Sener, Angela Yao

    Abstract: Temporal action segmentation (TAS) in videos aims at densely identifying video frames in minutes-long videos with multiple action classes. As a long-range video understanding task, researchers have developed an extended collection of methods and examined their performance using various benchmarks. Despite the rapid growth of TAS techniques in recent years, no systematic survey has been conducted i… ▽ More

    Submitted 21 October, 2023; v1 submitted 19 October, 2022; originally announced October 2022.

    Comments: 19 pages, 9 figures, 8 tables, TPAMI 2023

  34. Visualizing the Scripts of Data Wrangling with SOMNUS

    Authors: Kai Xiong, Siwei Fu, Guoming Ding, Zhongsu Luo, Rong Yu, Wei Chen, Hujun Bao, Yingcai Wu

    Abstract: Data workers use various scripting languages for data transformation, such as SAS, R, and Python. However, understanding intricate code pieces requires advanced programming skills, which hinders data workers from gras** the idea of data transformation at ease. Program visualization is beneficial for debugging and education and has the potential to illustrate transformations intuitively and inter… ▽ More

    Submitted 28 September, 2022; originally announced September 2022.

  35. arXiv:2208.08606  [pdf, other

    cs.LG

    AoI-based Temporal Attention Graph Neural Network for Popularity Prediction and Content Caching

    Authors: Jianhang Zhu, Rongpeng Li, Guoru Ding, Chan Wang, Jianjun Wu, Zhifeng Zhao, Honggang Zhang

    Abstract: Along with the fast development of network technology and the rapid growth of network equipment, the data throughput is sharply increasing. To handle the problem of backhaul bottleneck in cellular network and satisfy people's requirements about latency, the network architecture like information-centric network (ICN) intends to proactively keep limited popular content at the edge of network based o… ▽ More

    Submitted 17 August, 2022; originally announced August 2022.

  36. arXiv:2207.08653  [pdf, other

    cs.CV cs.AI

    Leveraging Action Affinity and Continuity for Semi-supervised Temporal Action Segmentation

    Authors: Guodong Ding, Angela Yao

    Abstract: We present a semi-supervised learning approach to the temporal action segmentation task. The goal of the task is to temporally detect and segment actions in long, untrimmed procedural videos, where only a small set of videos are densely labelled, and a large collection of videos are unlabelled. To this end, we propose two novel loss functions for the unlabelled data: an action affinity loss and an… ▽ More

    Submitted 21 July, 2022; v1 submitted 18 July, 2022; originally announced July 2022.

    Comments: 16 pages, 5 figures

  37. arXiv:2207.05933  [pdf, other

    cs.CV

    Rapid Person Re-Identification via Sub-space Consistency Regularization

    Authors: Qingze Yin, Guanan Wang, Guodong Ding, Qilei Li, Shaogang Gong, Zhenmin Tang

    Abstract: Person Re-Identification (ReID) matches pedestrians across disjoint cameras. Existing ReID methods adopting real-value feature descriptors have achieved high accuracy, but they are low in efficiency due to the slow Euclidean distance computation as well as complex quick-sort algorithms. Recently, some works propose to yield binary encoded person descriptors which instead only require fast Hamming… ▽ More

    Submitted 12 July, 2022; originally announced July 2022.

  38. arXiv:2205.15242  [pdf, other

    cs.LG cs.AI cs.CV

    Re-parameterizing Your Optimizers rather than Architectures

    Authors: Xiaohan Ding, Honghao Chen, Xiangyu Zhang, Kaiqi Huang, Jungong Han, Guiguang Ding

    Abstract: The well-designed structures in neural networks reflect the prior knowledge incorporated into the models. However, though different models have various priors, we are used to training them with model-agnostic optimizers such as SGD. In this paper, we propose to incorporate model-specific prior knowledge into optimizers by modifying the gradients according to a set of model-specific hyper-parameter… ▽ More

    Submitted 9 February, 2023; v1 submitted 30 May, 2022; originally announced May 2022.

    Comments: ICLR 2023

  39. arXiv:2205.13248  [pdf, other

    cs.LG cs.IR

    Constrained Reinforcement Learning for Short Video Recommendation

    Authors: Qingpeng Cai, Ruohan Zhan, Chi Zhang, Jie Zheng, Guangwei Ding, **hua Gong, Dong Zheng, Peng Jiang

    Abstract: The wide popularity of short videos on social media poses new opportunities and challenges to optimize recommender systems on the video-sharing platforms. Users provide complex and multi-faceted responses towards recommendations, including watch time and various types of interactions with videos. As a result, established recommendation algorithms that concern a single objective are not adequate to… ▽ More

    Submitted 26 May, 2022; originally announced May 2022.

  40. arXiv:2205.12073  [pdf, other

    eess.SP cs.IT

    Edge Semantic Cognitive Intelligence for 6G Networks: Novel Theoretical Models, Enabling Framework, and Typical Applications

    Authors: Peihao Dong, Qihui Wu, Xiaofei Zhang, Guoru Ding

    Abstract: Edge intelligence is anticipated to underlay the pathway to connected intelligence for 6G networks, but the organic confluence of edge computing and artificial intelligence still needs to be carefully treated. To this end, this article discusses the concepts of edge intelligence from the semantic cognitive perspective. Two instructive theoretical models for edge semantic cognitive intelligence (ES… ▽ More

    Submitted 9 July, 2022; v1 submitted 24 May, 2022; originally announced May 2022.

    Comments: 9 pages, 7 figures, accepted by China Communications as an invited paper

  41. arXiv:2205.03124  [pdf, other

    cs.CV

    A High-Accuracy Unsupervised Person Re-identification Method Using Auxiliary Information Mined from Datasets

    Authors: Hehan Teng, Tao He, Yuchen Guo, Guiguang Ding

    Abstract: Supervised person re-identification methods rely heavily on high-quality cross-camera training label. This significantly hinders the deployment of re-ID models in real-world applications. The unsupervised person re-ID methods can reduce the cost of data annotation, but their performance is still far lower than the supervised ones. In this paper, we make full use of the auxiliary information mined… ▽ More

    Submitted 6 May, 2022; originally announced May 2022.

  42. arXiv:2204.00891  [pdf, other

    cs.CV

    A Free Lunch to Person Re-identification: Learning from Automatically Generated Noisy Tracklets

    Authors: Hehan Teng, Tao He, Yuchen Guo, Zhenhua Guo, Guiguang Ding

    Abstract: A series of unsupervised video-based re-identification (re-ID) methods have been proposed to solve the problem of high labor cost required to annotate re-ID datasets. But their performance is still far lower than the supervised counterparts. In the mean time, clean datasets without noise are used in these methods, which is not realistic. In this paper, we propose to tackle this problem by learning… ▽ More

    Submitted 2 April, 2022; originally announced April 2022.

  43. arXiv:2204.00103  [pdf, other

    stat.ML cs.LG

    Scalable Whitebox Attacks on Tree-based Models

    Authors: Giuseppe Castiglione, Gavin Ding, Masoud Hashemi, Christopher Srinivasa, Ga Wu

    Abstract: Adversarial robustness is one of the essential safety criteria for guaranteeing the reliability of machine learning models. While various adversarial robustness testing approaches were introduced in the last decade, we note that most of them are incompatible with non-differentiable models such as tree ensembles. Since tree ensembles are widely used in industry, this reveals a crucial gap between a… ▽ More

    Submitted 31 March, 2022; originally announced April 2022.

  44. arXiv:2203.16894  [pdf, ps, other

    cs.IT eess.SP

    Analysis and Optimization of A Double-IRS Cooperatively Assisted System with A Quasi-Static Phase Shift Design

    Authors: Gengfa Ding, Feng Yang, Lianghui Ding, Ying Cui

    Abstract: The analysis and optimization of single intelligent reflecting surface (IRS)-assisted systems have been extensively studied, whereas little is known regarding multiple-IRS-assisted systems. This paper investigates the analysis and optimization of a double-IRS cooperatively assisted downlink system, where a multi-antenna base station (BS) serves a single-antenna user with the help of two multi-elem… ▽ More

    Submitted 4 June, 2022; v1 submitted 31 March, 2022; originally announced March 2022.

    Comments: 44 pages, 10 figures. To appear in SPAWC 2022;This work is submitted to IEEE Trans.Wireless Commun. (under major revision)

  45. arXiv:2203.08422  [pdf, other

    cs.CV

    Attribute Group Editing for Reliable Few-shot Image Generation

    Authors: Guanqi Ding, Xinzhe Han, Shuhui Wang, Shuzhe Wu, Xin **, Dandan Tu, Qingming Huang

    Abstract: Few-shot image generation is a challenging task even using the state-of-the-art Generative Adversarial Networks (GANs). Due to the unstable GAN training process and the limited training data, the generated images are often of low quality and low diversity. In this work, we propose a new editing-based method, i.e., Attribute Group Editing (AGE), for few-shot image generation. The basic assumption i… ▽ More

    Submitted 16 March, 2022; originally announced March 2022.

    Comments: CVPR2022

  46. arXiv:2203.06717  [pdf, other

    cs.CV cs.AI cs.LG

    Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs

    Authors: Xiaohan Ding, Xiangyu Zhang, Yizhuang Zhou, Jungong Han, Guiguang Ding, Jian Sun

    Abstract: We revisit large kernel design in modern convolutional neural networks (CNNs). Inspired by recent advances in vision transformers (ViTs), in this paper, we demonstrate that using a few large convolutional kernels instead of a stack of small kernels could be a more powerful paradigm. We suggested five guidelines, e.g., applying re-parameterized large depth-wise convolutions, to design efficient hig… ▽ More

    Submitted 2 April, 2022; v1 submitted 13 March, 2022; originally announced March 2022.

    Comments: Accepted by CVPR 2022

  47. arXiv:2203.04406  [pdf

    cs.CR cs.CY cs.SI eess.SY

    Routing with Privacy for Drone Package Delivery Systems

    Authors: Geoffrey Ding, Alex Berke, Karthik Gopalakrishnan, Kwassi H. Degue, Hamsa Balakrishnan, Max Z. Li

    Abstract: Unmanned aerial vehicles (UAVs), or drones, are increasingly being used to deliver goods from vendors to customers. To safely conduct these operations at scale, drones are required to broadcast position information as codified in remote identification (remote ID) regulations. However, location broadcast of package delivery drones introduces a privacy risk for customers using these delivery service… ▽ More

    Submitted 29 June, 2022; v1 submitted 4 March, 2022; originally announced March 2022.

    Journal ref: International Conference on Research in Air Transportation (ICRAT) 2022

  48. arXiv:2112.14239  [pdf, other

    cs.CV

    TAGPerson: A Target-Aware Generation Pipeline for Person Re-identification

    Authors: Kai Chen, Weihua Chen, Tao He, Rong Du, Fan Wang, Xiuyu Sun, Yuchen Guo, Guiguang Ding

    Abstract: Nowadays, real data in person re-identification (ReID) task is facing privacy issues, e.g., the banned dataset DukeMTMC-ReID. Thus it becomes much harder to collect real data for ReID task. Meanwhile, the labor cost of labeling ReID data is still very high and further hinders the development of the ReID research. Therefore, many methods turn to generate synthetic images for ReID algorithms as alte… ▽ More

    Submitted 28 December, 2021; originally announced December 2021.

  49. arXiv:2112.11081  [pdf, other

    cs.CV cs.AI cs.LG

    RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality

    Authors: Xiaohan Ding, Honghao Chen, Xiangyu Zhang, Jungong Han, Guiguang Ding

    Abstract: Compared to convolutional layers, fully-connected (FC) layers are better at modeling the long-range dependencies but worse at capturing the local patterns, hence usually less favored for image recognition. In this paper, we propose a methodology, Locality Injection, to incorporate local priors into an FC layer via merging the trained parameters of a parallel conv kernel into the FC kernel. Localit… ▽ More

    Submitted 30 March, 2022; v1 submitted 21 December, 2021; originally announced December 2021.

    Comments: Accepted by CVPR-2022. This is the latest version

  50. arXiv:2112.03731  [pdf, other

    cs.CV

    SalFBNet: Learning Pseudo-Saliency Distribution via Feedback Convolutional Networks

    Authors: Guanqun Ding, Nevrez Imamoglu, Ali Caglayan, Masahiro Murakawa, Ryosuke Nakamura

    Abstract: Feed-forward only convolutional neural networks (CNNs) may ignore intrinsic relationships and potential benefits of feedback connections in vision tasks such as saliency detection, despite their significant representation capabilities. In this work, we propose a feedback-recursive convolutional framework (SalFBNet) for saliency detection. The proposed feedback model can learn abundant contextual r… ▽ More

    Submitted 10 January, 2022; v1 submitted 7 December, 2021; originally announced December 2021.