Skip to main content

Showing 1–47 of 47 results for author: Tong, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.16177  [pdf, other

    cs.HC

    Flowy: Supporting UX Design Decisions Through AI-Driven Pattern Annotation in Multi-Screen User Flows

    Authors: Yuwen Lu, Ziang Tong, Qinyi Zhao, Yewon Oh, Bryan Wang, Toby Jia-Jun Li

    Abstract: Many recent AI-powered UX design tools focus on generating individual static UI screens from natural language. However, they overlook the crucial aspect of interactions and user experiences across multiple screens. Through formative studies with UX professionals, we identified limitations of these tools in supporting realistic UX design workflows. In response, we designed and developed Flowy, an a… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  2. arXiv:2405.18910  [pdf, other

    cs.AI

    Predicting Parking Availability in Singapore with Cross-Domain Data: A New Dataset and A Data-Driven Approach

    Authors: Huaiwu Zhang, Yutong Xia, Siru Zhong, Kun Wang, Zekun Tong, Qingsong Wen, Roger Zimmermann, Yuxuan Liang

    Abstract: The increasing number of vehicles highlights the need for efficient parking space management. Predicting real-time Parking Availability (PA) can help mitigate traffic congestion and the corresponding social problems, which is a pressing issue in densely populated cities like Singapore. In this study, we aim to collectively predict future PA across Singapore with complex factors from various domain… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Accepted by IJCAI 2024 (Multi-Year Track On AI And Social Good with ~20% acceptance rate)

  3. arXiv:2404.14464  [pdf, other

    cs.CL cs.AI cs.IR

    Tree of Reviews: A Tree-based Dynamic Iterative Retrieval Framework for Multi-hop Question Answering

    Authors: Li Jiapeng, Liu Runze, Li Yabo, Zhou Tong, Li Mingling, Chen Xiang

    Abstract: Multi-hop question answering is a knowledge-intensive complex problem. Large Language Models (LLMs) use their Chain of Thoughts (CoT) capability to reason complex problems step by step, and retrieval-augmentation can effectively alleviate factual errors caused by outdated and unknown knowledge in LLMs. Recent works have introduced retrieval-augmentation in the CoT reasoning to solve multi-hop ques… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: Keywords: Muti-hop Question Answering; Retrieval-Augmented Generation; Tree of Thought; Reasoning TLDR: We proposed a tree-based dynamic, iterative retrieval framework for multi-hop question answering

  4. arXiv:2403.12922  [pdf, other

    cs.CV

    Contextual AD Narration with Interleaved Multimodal Sequence

    Authors: Hanlin Wang, Zhan Tong, Kecheng Zheng, Yujun Shen, Limin Wang

    Abstract: The Audio Description (AD) task aims to generate descriptions of visual elements for visually impaired individuals to help them access long-form video contents, like movie. With video feature, text, character bank and context information as inputs, the generated ADs are able to correspond to the characters by name and provide reasonable, contextual descriptions to help audience understand the stor… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  5. arXiv:2312.14149  [pdf, other

    cs.CV cs.AI

    TagAlign: Improving Vision-Language Alignment with Multi-Tag Classification

    Authors: Qinying Liu, Wei Wu, Kecheng Zheng, Zhan Tong, Jiawei Liu, Yu Liu, Wei Chen, Zilei Wang, Yujun Shen

    Abstract: The crux of learning vision-language models is to extract semantically aligned information from visual and linguistic data. Existing attempts usually face the problem of coarse alignment, e.g., the vision encoder struggles in localizing an attribute-specified object. In this work, we propose an embarrassingly simple approach to better align image and text features with no need of additional data f… ▽ More

    Submitted 26 March, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

  6. arXiv:2312.01987  [pdf, other

    cs.CV

    Bootstrap** SparseFormers from Vision Foundation Models

    Authors: Ziteng Gao, Zhan Tong, Kevin Qinghong Lin, Joya Chen, Mike Zheng Shou

    Abstract: The recently proposed SparseFormer architecture provides an alternative approach to visual understanding by utilizing a significantly lower number of visual tokens via adjusting RoIs, greatly reducing computational costs while still achieving promising performance. However, training SparseFormers from scratch is still expensive, and scaling up the number of parameters can be challenging. In this p… ▽ More

    Submitted 4 April, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

    Comments: CVPR 2024

  7. arXiv:2311.15157  [pdf, other

    cs.CV

    Advancing Vision Transformers with Group-Mix Attention

    Authors: Chongjian Ge, Xiaohan Ding, Zhan Tong, Li Yuan, Jiangliu Wang, Yibing Song, ** Luo

    Abstract: Vision Transformers (ViTs) have been shown to enhance visual recognition through modeling long-range dependencies with multi-head self-attention (MHSA), which is typically formulated as Query-Key-Value computation. However, the attention map generated from the Query and Key captures only token-to-token correlations at one single granularity. In this paper, we argue that self-attention should have… ▽ More

    Submitted 25 November, 2023; originally announced November 2023.

  8. arXiv:2310.15455  [pdf, other

    cs.HC cs.AI

    UI Layout Generation with LLMs Guided by UI Grammar

    Authors: Yuwen Lu, Ziang Tong, Qinyi Zhao, Chengzhi Zhang, Toby Jia-Jun Li

    Abstract: The recent advances in Large Language Models (LLMs) have stimulated interest among researchers and industry professionals, particularly in their application to tasks concerning mobile user interfaces (UIs). This position paper investigates the use of LLMs for UI layout generation. Central to our exploration is the introduction of UI grammar -- a novel approach we proposed to represent the hierarch… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: ICML 2023 Workshop on AI and HCI

  9. arXiv:2309.13942  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    Speed Co-Augmentation for Unsupervised Audio-Visual Pre-training

    Authors: Jiangliu Wang, Jianbo Jiao, Yibing Song, Stephen James, Zhan Tong, Chongjian Ge, Pieter Abbeel, Yun-hui Liu

    Abstract: This work aims to improve unsupervised audio-visual pre-training. Inspired by the efficacy of data augmentation in visual contrastive learning, we propose a novel speed co-augmentation method that randomly changes the playback speeds of both audio and video data. Despite its simplicity, the speed co-augmentation method possesses two compelling attributes: (1) it increases the diversity of audio-vi… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

    Comments: Published at the CVPR 2023 Sight and Sound workshop

  10. arXiv:2305.14173  [pdf, other

    cs.CV cs.AI

    TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at Scale

    Authors: Ziyun Zeng, Yixiao Ge, Zhan Tong, Xihui Liu, Shu-Tao Xia, Ying Shan

    Abstract: The ultimate goal for foundation models is realizing task-agnostic, i.e., supporting out-of-the-box usage without task-specific fine-tuning. Although breakthroughs have been made in natural language processing and image representation learning, it is still challenging for video models to reach it due to the increasing uncertainty of spatiotemporal signals. To ease training, existing works leverage… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Comments: Technical Report

  11. arXiv:2305.07095  [pdf, other

    cs.CL cs.AI cs.LG

    Are Machine Rationales (Not) Useful to Humans? Measuring and Improving Human Utility of Free-Text Rationales

    Authors: Brihi Joshi, Ziyi Liu, Sahana Ramnath, Aaron Chan, Zhewei Tong, Shaoliang Nie, Qifan Wang, Ye** Choi, Xiang Ren

    Abstract: Among the remarkable emergent capabilities of large language models (LMs) is free-text rationalization; beyond a certain scale, large LMs are capable of generating seemingly useful rationalizations, which in turn, can dramatically enhance their performances on leaderboards. This phenomenon raises a question: can machine generated rationales also be useful for humans, especially when lay humans try… ▽ More

    Submitted 11 May, 2023; originally announced May 2023.

    Comments: Accepted at ACL 2023

  12. arXiv:2304.08451  [pdf, other

    cs.CV

    Efficient Video Action Detection with Token Dropout and Context Refinement

    Authors: Lei Chen, Zhan Tong, Yibing Song, Gangshan Wu, Limin Wang

    Abstract: Streaming video clips with large-scale video tokens impede vision transformers (ViTs) for efficient recognition, especially in video action detection where sufficient spatiotemporal representations are required for precise actor identification. In this work, we propose an end-to-end framework for efficient video action detection (EVAD) based on vanilla ViTs. Our EVAD consists of two specialized de… ▽ More

    Submitted 28 August, 2023; v1 submitted 17 April, 2023; originally announced April 2023.

    Comments: technical report

  13. arXiv:2304.03768  [pdf, other

    cs.CV

    SparseFormer: Sparse Visual Recognition via Limited Latent Tokens

    Authors: Ziteng Gao, Zhan Tong, Limin Wang, Mike Zheng Shou

    Abstract: Human visual recognition is a sparse process, where only a few salient visual cues are attended to rather than traversing every detail uniformly. However, most current vision networks follow a dense paradigm, processing every single visual unit (e.g,, pixel or patch) in a uniform manner. In this paper, we challenge this dense paradigm and present a new method, coined SparseFormer, to imitate human… ▽ More

    Submitted 7 April, 2023; originally announced April 2023.

    Comments: Technical report

  14. arXiv:2303.17142  [pdf, other

    cs.CV

    Soft Neighbors are Positive Supporters in Contrastive Visual Representation Learning

    Authors: Chongjian Ge, Jiangliu Wang, Zhan Tong, Shoufa Chen, Yibing Song, ** Luo

    Abstract: Contrastive learning methods train visual encoders by comparing views from one instance to others. Typically, the views created from one instance are set as positive, while views from other instances are negative. This binary instance discrimination is studied extensively to improve feature representations in self-supervised learning. In this paper, we rethink the instance discrimination framework… ▽ More

    Submitted 30 March, 2023; originally announced March 2023.

    Comments: Accepted by ICLR23

  15. arXiv:2303.16727  [pdf, other

    cs.CV cs.LG

    VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking

    Authors: Limin Wang, Bingkun Huang, Zhiyu Zhao, Zhan Tong, Yinan He, Yi Wang, Yali Wang, Yu Qiao

    Abstract: Scale is the primary factor for building a powerful foundation model that could well generalize to a variety of downstream tasks. However, it is still challenging to train video foundation models with billions of parameters. This paper shows that video masked autoencoder (VideoMAE) is a scalable and general self-supervised pre-trainer for building video foundation models. We scale the VideoMAE in… ▽ More

    Submitted 18 April, 2023; v1 submitted 29 March, 2023; originally announced March 2023.

    Comments: CVPR 2023 camera-ready version

  16. arXiv:2303.16118  [pdf, other

    cs.CV

    CycleACR: Cycle Modeling of Actor-Context Relations for Video Action Detection

    Authors: Lei Chen, Zhan Tong, Yibing Song, Gangshan Wu, Limin Wang

    Abstract: The relation modeling between actors and scene context advances video action detection where the correlation of multiple actors makes their action recognition challenging. Existing studies model each actor and scene relation to improve action recognition. However, the scene variations and background interference limit the effectiveness of this relation modeling. In this paper, we propose to select… ▽ More

    Submitted 28 March, 2023; originally announced March 2023.

    Comments: technical report

  17. arXiv:2301.10051  [pdf, other

    cs.CV

    Wise-IoU: Bounding Box Regression Loss with Dynamic Focusing Mechanism

    Authors: Zanjia Tong, Yuhang Chen, Zewei Xu, Rong Yu

    Abstract: The loss function for bounding box regression (BBR) is essential to object detection. Its good definition will bring significant performance improvement to the model. Most existing works assume that the examples in the training data are high-quality and focus on strengthening the fitting ability of BBR loss. If we blindly strengthen BBR on low-quality examples, it will jeopardize localization perf… ▽ More

    Submitted 8 April, 2023; v1 submitted 24 January, 2023; originally announced January 2023.

  18. arXiv:2212.03499  [pdf, other

    cs.CV cs.AI

    Learning Continuous Depth Representation via Geometric Spatial Aggregator

    Authors: Xiaohang Wang, Xuanhong Chen, Bingbing Ni, Zhengyan Tong, Hang Wang

    Abstract: Depth map super-resolution (DSR) has been a fundamental task for 3D computer vision. While arbitrary scale DSR is a more realistic setting in this scenario, previous approaches predominantly suffer from the issue of inefficient real-numbered scale upsampling. To explicitly address this issue, we propose a novel continuous depth representation for DSR. The heart of this representation is our propos… ▽ More

    Submitted 7 December, 2022; originally announced December 2022.

    Comments: Accepted to AAAI 2023. Code is available at https://github.com/nana01219/GeoDSR

    ACM Class: I.4

  19. arXiv:2209.13219  [pdf, other

    cs.CV cs.LG cs.MM

    Im2Oil: Stroke-Based Oil Painting Rendering with Linearly Controllable Fineness Via Adaptive Sampling

    Authors: Zhengyan Tong, Xiaohang Wang, Shengchao Yuan, Xuanhong Chen, Junjie Wang, Xiangzhong Fang

    Abstract: This paper proposes a novel stroke-based rendering (SBR) method that translates images into vivid oil paintings. Previous SBR techniques usually formulate the oil painting problem as pixel-wise approximation. Different from this technique route, we treat oil painting creation as an adaptive sampling problem. Firstly, we compute a probability density map based on the texture complexity of the input… ▽ More

    Submitted 27 September, 2022; originally announced September 2022.

    Comments: ACM MM 2022 oral paper, accepted by the 30th ACM International Conference on Multimedia

  20. arXiv:2208.14062  [pdf, ps, other

    cs.CR

    Attack detection based on machine learning algorithms for different variants of Spectre attacks and different Meltdown attack implementations

    Authors: Zhongkai Tong, Ziyuan Zhu, Yusha Zhang, Yuxin Liu, Dan Meng

    Abstract: To improve the overall performance of processors, computer architects use various performance optimization techniques in modern processors, such as speculative execution, branch prediction, and chaotic execution. Both now and in the future, these optimization techniques are critical for improving the execution speed of processor instructions. However, researchers have discovered that these techniq… ▽ More

    Submitted 30 August, 2022; originally announced August 2022.

  21. arXiv:2208.00775  [pdf, other

    cs.CV

    Pavementscapes: a large-scale hierarchical image dataset for asphalt pavement damage segmentation

    Authors: Zheng Tong, Tao Ma, Ju Huyan, Weiguang Zhang

    Abstract: Pavement damage segmentation has benefited enormously from deep learning. % and large-scale datasets. However, few current public datasets limit the potential exploration of deep learning in the application of pavement damage segmentation. To address this problem, this study has proposed Pavementscapes, a large-scale dataset to develop and evaluate methods for pavement damage segmentation. Pavemen… ▽ More

    Submitted 23 July, 2022; originally announced August 2022.

  22. arXiv:2205.13535  [pdf, other

    cs.CV

    AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition

    Authors: Shoufa Chen, Chongjian Ge, Zhan Tong, Jiangliu Wang, Yibing Song, Jue Wang, ** Luo

    Abstract: Pretraining Vision Transformers (ViTs) has achieved great success in visual recognition. A following scenario is to adapt a ViT to various image and video recognition tasks. The adaptation is challenging because of heavy computation and memory storage. Each model needs an independent and complete finetuning process to adapt to different tasks, which limits its transferability to different visual d… ▽ More

    Submitted 14 October, 2022; v1 submitted 26 May, 2022; originally announced May 2022.

    Comments: Accepted by NeurIPS 2022. Code: https://github.com/ShoufaChen/AdaptFormer

  23. arXiv:2205.12627  [pdf, other

    cs.CV

    Primitive3D: 3D Object Dataset Synthesis from Randomly Assembled Primitives

    Authors: Xinke Li, Henghui Ding, Zekun Tong, Yuwei Wu, Yeow Meng Chee

    Abstract: Numerous advancements in deep learning can be attributed to the access to large-scale and well-annotated datasets. However, such a dataset is prohibitively expensive in 3D computer vision due to the substantial collection cost. To alleviate this issue, we propose a cost-effective method for automatically generating a large amount of 3D objects with annotations. In particular, we synthesize objects… ▽ More

    Submitted 25 May, 2022; originally announced May 2022.

    Comments: CVPR 2022

  24. arXiv:2203.12602  [pdf, other

    cs.CV

    VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

    Authors: Zhan Tong, Yibing Song, Jue Wang, Limin Wang

    Abstract: Pre-training video transformers on extra large-scale datasets is generally required to achieve premier performance on relatively small datasets. In this paper, we show that video masked autoencoders (VideoMAE) are data-efficient learners for self-supervised video pre-training (SSVP). We are inspired by the recent ImageMAE and propose customized video tube masking with an extremely high ratio. This… ▽ More

    Submitted 18 October, 2022; v1 submitted 23 March, 2022; originally announced March 2022.

    Comments: NeurIPS 2022 camera-ready version

  25. arXiv:2203.03796  [pdf, other

    cs.CV

    PAMI-AD: An Activity Detector Exploiting Part-attention and Motion Information in Surveillance Videos

    Authors: Yunhao Du, Zhihang Tong, Junfeng Wan, Binyu Zhang, Yanyun Zhao

    Abstract: Activity detection in surveillance videos is a challenging task caused by small objects, complex activity categories, its untrimmed nature, etc. Existing methods are generally limited in performance due to inaccurate proposals, poor classifiers or inadequate post-processing method. In this work, we propose a comprehensive and effective activity detection system in untrimmed surveillance videos for… ▽ More

    Submitted 9 May, 2022; v1 submitted 7 March, 2022; originally announced March 2022.

    Comments: ICME 2022 Workshop

  26. arXiv:2203.01682  [pdf, other

    cs.CV

    Bridging the Source-to-target Gap for Cross-domain Person Re-Identification with Intermediate Domains

    Authors: Yongxing Dai, Yifan Sun, Jun Liu, Zekun Tong, Yi Yang, Ling-Yu Duan

    Abstract: Cross-domain person re-identification (re-ID), such as unsupervised domain adaptive (UDA) re-ID, aims to transfer the identity-discriminative knowledge from the source to the target domain. Existing methods commonly consider the source and target domains are isolated from each other, i.e., no intermediate status is modeled between both domains. Directly transferring the knowledge between two isola… ▽ More

    Submitted 3 March, 2022; originally announced March 2022.

    Comments: This work is a journal extension of our ICCV 2021 Oral paper https://openaccess.thecvf.com/content/ICCV2021/html/Dai_IDM_An_Intermediate_Domain_Module_for_Domain_Adaptive_Person_Re-ID_ICCV_2021_paper.html

  27. arXiv:2202.11983  [pdf, other

    cs.CV

    GIAOTracker: A comprehensive framework for MCMOT with global information and optimizing strategies in VisDrone 2021

    Authors: Yunhao Du, Junfeng Wan, Yanyun Zhao, Binyu Zhang, Zhihang Tong, Junhao Dong

    Abstract: In recent years, algorithms for multiple object tracking tasks have benefited from great progresses in deep models and video quality. However, in challenging scenarios like drone videos, they still suffer from problems, such as small objects, camera movements and view changes. In this paper, we propose a new multiple object tracker, which employs Global Information And some Optimizing strategies,… ▽ More

    Submitted 24 February, 2022; originally announced February 2022.

    Comments: ICCV 2021 Workshop

    Journal ref: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 2809-2819

  28. arXiv:2202.07800  [pdf, other

    cs.CV cs.LG

    Not All Patches are What You Need: Expediting Vision Transformers via Token Reorganizations

    Authors: Youwei Liang, Chongjian Ge, Zhan Tong, Yibing Song, Jue Wang, Pengtao Xie

    Abstract: Vision Transformers (ViTs) take all the image patches as tokens and construct multi-head self-attention (MHSA) among them. Complete leverage of these image tokens brings redundant computations since not all the tokens are attentive in MHSA. Examples include that tokens containing semantically meaningless or distractive image backgrounds do not positively contribute to the ViT predictions. In this… ▽ More

    Submitted 13 April, 2022; v1 submitted 15 February, 2022; originally announced February 2022.

    Comments: ICLR 2022 Spotlight

  29. arXiv:2108.10233  [pdf, other

    cs.CV cs.AI

    Fusion of evidential CNN classifiers for image classification

    Authors: Zheng Tong, Philippe Xu, Thierry Denoeux

    Abstract: We propose an information-fusion approach based on belief functions to combine convolutional neural networks. In this approach, several pre-trained DS-based CNN architectures extract features from input images and convert them into mass functions on different frames of discernment. A fusion module then aggregates these mass functions using Dempster's rule. An end-to-end learning procedure allows u… ▽ More

    Submitted 23 August, 2021; originally announced August 2021.

  30. arXiv:2108.02413  [pdf, other

    cs.CV

    IDM: An Intermediate Domain Module for Domain Adaptive Person Re-ID

    Authors: Yongxing Dai, Jun Liu, Yifan Sun, Zekun Tong, Chi Zhang, Ling-Yu Duan

    Abstract: Unsupervised domain adaptive person re-identification (UDA re-ID) aims at transferring the labeled source domain's knowledge to improve the model's discriminability on the unlabeled target domain. From a novel perspective, we argue that the bridging between the source and target domains can be utilized to tackle the UDA re-ID task, and we focus on explicitly modeling appropriate intermediate domai… ▽ More

    Submitted 5 August, 2021; originally announced August 2021.

    Comments: Accepted by ICCV 2021 (Oral)

  31. arXiv:2105.09156  [pdf, other

    cs.CV

    Generalizable Person Re-identification with Relevance-aware Mixture of Experts

    Authors: Yongxing Dai, Xiaotong Li, Jun Liu, Zekun Tong, Ling-Yu Duan

    Abstract: Domain generalizable (DG) person re-identification (ReID) is a challenging problem because we cannot access any unseen target domain data during training. Almost all the existing DG ReID methods follow the same pipeline where they use a hybrid dataset from multiple source domains for training, and then directly apply the trained model to the unseen target domains for testing. These methods often n… ▽ More

    Submitted 19 May, 2021; originally announced May 2021.

    Comments: Accepted to CVPR 2021

  32. LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech

    Authors: Solene Evain, Ha Nguyen, Hang Le, Marcely Zanon Boito, Salima Mdhaffar, Sina Alisamir, Ziyi Tong, Natalia Tomashenko, Marco Dinarelli, Titouan Parcollet, Alexandre Allauzen, Yannick Esteve, Benjamin Lecouteux, Francois Portet, Solange Rossato, Fabien Ringeval, Didier Schwab, Laurent Besacier

    Abstract: Self-Supervised Learning (SSL) using huge unlabeled data has been successfully explored for image and natural language processing. Recent works also investigated SSL from speech. They were notably successful to improve performance on downstream tasks such as automatic speech recognition (ASR). While these works suggest it is possible to reduce dependence on labeled data for building efficient spee… ▽ More

    Submitted 10 June, 2021; v1 submitted 23 April, 2021; originally announced April 2021.

    Comments: Will be presented at Interspeech 2021

    Journal ref: Proc. Interspeech 2021

  33. arXiv:2104.09952  [pdf, other

    cs.CV

    MGSampler: An Explainable Sampling Strategy for Video Action Recognition

    Authors: Yuan Zhi, Zhan Tong, Limin Wang, Gangshan Wu

    Abstract: Frame sampling is a fundamental problem in video action recognition due to the essential redundancy in time and limited computation resources. The existing sampling strategy often employs a fixed frame selection and lacks the flexibility to deal with complex variations in videos. In this paper, we present a simple, sparse, and explainable frame sampler, termed as Motion-Guided Sampler (MGSampler).… ▽ More

    Submitted 20 August, 2021; v1 submitted 20 April, 2021; originally announced April 2021.

    Comments: ICCV 2021 camera ready version

  34. arXiv:2103.16074  [pdf, other

    cs.LG cs.CR cs.CV

    PointBA: Towards Backdoor Attacks in 3D Point Cloud

    Authors: Xinke Li, Zhirui Chen, Yue Zhao, Zekun Tong, Yabang Zhao, Andrew Lim, Joey Tianyi Zhou

    Abstract: 3D deep learning has been increasingly more popular for a variety of tasks including many safety-critical applications. However, recently several works raise the security issues of 3D deep models. Although most of them consider adversarial attacks, we identify that backdoor attack is indeed a more serious threat to 3D deep learning systems but remains unexplored. We present the backdoor attacks in… ▽ More

    Submitted 22 August, 2021; v1 submitted 30 March, 2021; originally announced March 2021.

    Comments: Accepted by ICCV 2021

  35. An evidential classifier based on Dempster-Shafer theory and deep learning

    Authors: Zheng Tong, Philippe Xu, Thierry Denœux

    Abstract: We propose a new classifier based on Dempster-Shafer (DS) theory and a convolutional neural network (CNN) architecture for set-valued classification. In this classifier, called the evidential deep-learning classifier, convolutional and pooling layers first extract high-dimensional features from input data. The features are then converted into mass functions and aggregated by Dempster's rule in a D… ▽ More

    Submitted 24 March, 2021; originally announced March 2021.

    Journal ref: Neurocomputing, Vol. 450, pages 275-293, 2021

  36. Evidential fully convolutional network for semantic segmentation

    Authors: Zheng Tong, Philippe Xu, Thierry Denœux

    Abstract: We propose a hybrid architecture composed of a fully convolutional network (FCN) and a Dempster-Shafer layer for image semantic segmentation. In the so-called evidential FCN (E-FCN), an encoder-decoder architecture first extracts pixel-wise feature maps from an input image. A Dempster-Shafer layer then computes mass functions at each pixel location based on distances to prototypes. Finally, a util… ▽ More

    Submitted 24 March, 2021; originally announced March 2021.

    Comments: 34 pages, 21 figures

    Journal ref: Applied Intelligence, volume 51, pages 6376-6399 (2021)

  37. Dual-Refinement: Joint Label and Feature Refinement for Unsupervised Domain Adaptive Person Re-Identification

    Authors: Yongxing Dai, Jun Liu, Yan Bai, Zekun Tong, Ling-Yu Duan

    Abstract: Unsupervised domain adaptive (UDA) person re-identification (re-ID) is a challenging task due to the missing of labels for the target domain data. To handle this problem, some recent works adopt clustering algorithms to off-line generate pseudo labels, which can then be used as the supervision signal for on-line feature learning in the target domain. However, the off-line generated labels often co… ▽ More

    Submitted 17 January, 2021; v1 submitted 26 December, 2020; originally announced December 2020.

    Comments: 14 pages, 5 figures

  38. arXiv:2012.10071  [pdf, other

    cs.CV

    TDN: Temporal Difference Networks for Efficient Action Recognition

    Authors: Limin Wang, Zhan Tong, Bin Ji, Gangshan Wu

    Abstract: Temporal modeling still remains challenging for action recognition in videos. To mitigate this issue, this paper presents a new video architecture, termed as Temporal Difference Network (TDN), with a focus on capturing multi-scale temporal information for efficient action recognition. The core of our TDN is to devise an efficient temporal module (TDM) by explicitly leveraging a temporal difference… ▽ More

    Submitted 31 March, 2021; v1 submitted 18 December, 2020; originally announced December 2020.

    Comments: CVPR2021 camera-ready version

  39. arXiv:2012.09700  [pdf, other

    cs.CV

    RainNet: A Large-Scale Imagery Dataset and Benchmark for Spatial Precipitation Downscaling

    Authors: Xuanhong Chen, Kairui Feng, Naiyuan Liu, Bingbing Ni, Yifan Lu, Zhengyan Tong, Ziang Liu

    Abstract: AI-for-science approaches have been applied to solve scientific problems (e.g., nuclear fusion, ecology, genomics, meteorology) and have achieved highly promising results. Spatial precipitation downscaling is one of the most important meteorological problem and urgently requires the participation of AI. However, the lack of a well-organized and annotated large-scale dataset hinders the training an… ▽ More

    Submitted 14 October, 2022; v1 submitted 17 December, 2020; originally announced December 2020.

    Comments: Accepted at NeurIPS 2022. Project page: https://neuralchen.github.io/RainNet/

    Journal ref: Conference on Neural Information Processing Systems (NeurIPS) 2022

  40. arXiv:2012.09004  [pdf, other

    cs.CV

    Sketch Generation with Drawing Process Guided by Vector Flow and Grayscale

    Authors: Zhengyan Tong, Xuanhong Chen, Bingbing Ni, Xiaohang Wang

    Abstract: We propose a novel image-to-pencil translation method that could not only generate high-quality pencil sketches but also offer the drawing process. Existing pencil sketch algorithms are based on texture rendering rather than the direct imitation of strokes, making them unable to show the drawing process but only a final result. To address this challenge, we first establish a pencil stroke imitatio… ▽ More

    Submitted 16 December, 2020; originally announced December 2020.

    Comments: This paper has been accepted for presentation at the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21)

  41. arXiv:2008.04968  [pdf, other

    cs.CV cs.LG eess.IV

    Campus3D: A Photogrammetry Point Cloud Benchmark for Hierarchical Understanding of Outdoor Scene

    Authors: Xinke Li, Chongshou Li, Zekun Tong, Andrew Lim, Junsong Yuan, Yuwei Wu, **g Tang, Raymond Huang

    Abstract: Learning on 3D scene-based point cloud has received extensive attention as its promising application in many fields, and well-annotated and multisource datasets can catalyze the development of those data-driven approaches. To facilitate the research of this area, we present a richly-annotated 3D point cloud dataset for multiple outdoor scene understanding tasks and also an effective learning frame… ▽ More

    Submitted 11 August, 2020; originally announced August 2020.

    Comments: Accepted by the 28th ACM International Conference on Multimedia (ACM MM 2020)

    Journal ref: Proceedings of the 28th ACM International Conference on Multimedia 2020

  42. arXiv:2004.13970  [pdf, other

    cs.LG stat.ML

    Directed Graph Convolutional Network

    Authors: Zekun Tong, Yuxuan Liang, Changsheng Sun, David S. Rosenblum, Andrew Lim

    Abstract: Graph Convolutional Networks (GCNs) have been widely used due to their outstanding performance in processing graph-structured data. However, the undirected graphs limit their application scope. In this paper, we extend spectral-based graph convolution to directed graphs by using first- and second-order proximity, which can not only retain the connection properties of the directed graph, but also e… ▽ More

    Submitted 29 April, 2020; originally announced April 2020.

  43. arXiv:2002.02318  [pdf, other

    cs.CV cs.LG stat.ML

    Fine-Grained Urban Flow Inference

    Authors: Kun Ouyang, Yuxuan Liang, Ye Liu, Zekun Tong, Sijie Ruan, Yu Zheng, David S. Rosenblum

    Abstract: The ubiquitous deployment of monitoring devices in urban flow monitoring systems induces a significant cost for maintenance and operation. A technique is required to reduce the number of deployed devices, while preventing the degeneration of data accuracy and granularity. In this paper, we present an approach for inferring the real-time and fine-grained crowd flows throughout a city based on coars… ▽ More

    Submitted 4 February, 2020; originally announced February 2020.

    Comments: 16 pages. arXiv admin note: substantial text overlap with arXiv:1902.05377

  44. arXiv:1910.04926  [pdf, other

    cs.IT

    Preconditioned Multiple Orthogonal Least Squares and Applications in Ghost Imaging via Sparsity Constraint

    Authors: Zhishen Tong, Jian Wang, Shensheng Han

    Abstract: Ghost imaging via sparsity constraint (GISC) can recover objects from the intensity fluctuation of light fields even when the sampling rate is far below the Nyquist sampling rate. In this paper, we develop an efficient algorithm called the preconditioned multiple orthogonal least squares (PmOLS) for solving the GISC reconstruction problem. Our analysis shows that the PmOLS algorithm perfectly reco… ▽ More

    Submitted 10 October, 2019; originally announced October 2019.

    Comments: 17 pages

  45. arXiv:1707.09890  [pdf, other

    cs.SD

    Bearing fault diagnosis under varying working condition based on domain adaptation

    Authors: Bo Zhang, Wei Li, Zhe Tong, Meng Zhang

    Abstract: Traditional intelligent fault diagnosis of rolling bearings work well only under a common assumption that the labeled training data (source domain) and unlabeled testing data (target domain) are drawn from the same distribution. When the distribution changes, most fault diagnosis models need to be rebuilt from scratch using newly recollected labeled training data. However, it is expensive or impos… ▽ More

    Submitted 31 July, 2017; originally announced July 2017.

  46. arXiv:1502.07404  [pdf, other

    cs.IT

    Throughput Analysis for Full-Duplex Wireless Networks with Imperfect Self-interference Cancellation

    Authors: Zhen Tong, Martin Haenggi

    Abstract: This paper investigates the throughput for wireless network with full-duplex radios using stochastic geometry. Full-duplex (FD) radios can exchange data simultaneously with each other. On the other hand, the downside of FD transmission is that it will inevitably cause extra interference to the network compared to half-duplex (HD) transmission. Moreover, the residual self-interference has negative… ▽ More

    Submitted 25 February, 2015; originally announced February 2015.

    Comments: 6 figures. arXiv admin note: substantial text overlap with arXiv:1409.7433

  47. arXiv:1409.7433  [pdf, other

    cs.IT cs.NI

    Throughput Analysis for Wireless Networks with Full-Duplex Radios

    Authors: Zhen Tong, Martin Haenggi

    Abstract: This paper investigates the throughput for wireless network with full-duplex radios using stochastic geometry. Full-duplex (FD) radios can exchange data simultaneously with each other. On the other hand, the downside of FD transmission is that it will inevitably cause extra interference to the network compared to half-duplex (HD) transmission. In this paper, we focus on a wireless network of nodes… ▽ More

    Submitted 25 September, 2014; originally announced September 2014.

    Comments: 4 figures