Skip to main content

Showing 1–50 of 89 results for author: Zhou, J T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.06965  [pdf, other

    cs.CV

    Evolving from Single-modal to Multi-modal Facial Deepfake Detection: A Survey

    Authors: ** Liu, Qiqi Tao, Joey Tianyi Zhou

    Abstract: This survey addresses the critical challenge of deepfake detection amidst the rapid advancements in artificial intelligence. As AI-generated media, including video, audio and text, become more realistic, the risk of misuse to spread misinformation and commit identity fraud increases. Focused on face-centric deepfakes, this work traces the evolution from traditional single-modality methods to sophi… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  2. arXiv:2406.05677  [pdf, other

    cs.CV

    Evolution-aware VAriance (EVA) Coreset Selection for Medical Image Classification

    Authors: Yuxin Hong, Xiao Zhang, Xin Zhang, Joey Tianyi Zhou

    Abstract: In the medical field, managing high-dimensional massive medical imaging data and performing reliable medical analysis from it is a critical challenge, especially in resource-limited environments such as remote medical facilities and mobile devices. This necessitates effective dataset compression techniques to reduce storage, transmission, and computational cost. However, existing coreset selection… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  3. arXiv:2404.13648  [pdf, other

    cs.CV cs.LG

    Data-independent Module-aware Pruning for Hierarchical Vision Transformers

    Authors: Yang He, Joey Tianyi Zhou

    Abstract: Hierarchical vision transformers (ViTs) have two advantages over conventional ViTs. First, hierarchical ViTs achieve linear computational complexity with respect to image size by local self-attention. Second, hierarchical ViTs create hierarchical feature maps by merging image patches in deeper layers for dense prediction. However, existing pruning methods ignore the unique properties of hierarchic… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: Accepted by ICLR 2024

  4. arXiv:2404.00461  [pdf, other

    cs.LG cs.AI cs.CL cs.CR

    Shortcuts Arising from Contrast: Effective and Covert Clean-Label Attacks in Prompt-Based Learning

    Authors: Xiaopeng Xie, Ming Yan, Xiwen Zhou, Chenlong Zhao, Suli Wang, Yong Zhang, Joey Tianyi Zhou

    Abstract: Prompt-based learning paradigm has demonstrated remarkable efficacy in enhancing the adaptability of pretrained language models (PLMs), particularly in few-shot scenarios. However, this learning paradigm has been shown to be vulnerable to backdoor attacks. The current clean-label attack, employing a specific prompt as a trigger, can achieve success without the need for external triggers and ensure… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: 10 pages, 6 figures, conference

    MSC Class: 68T50 ACM Class: I.2.7

  5. Collaborative Knowledge Infusion for Low-resource Stance Detection

    Authors: Ming Yan, Joey Tianyi Zhou, Ivor W. Tsang

    Abstract: Stance detection is the view towards a specific target by a given context (\textit{e.g.} tweets, commercial reviews). Target-related knowledge is often needed to assist stance detection models in understanding the target well and making detection correctly. However, prevailing works for knowledge-infused stance detection predominantly incorporate target knowledge from a singular source that lacks… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: 13 pages, 3 figures, Big Data Mining and Analysis

  6. arXiv:2403.10082  [pdf, other

    cs.CV

    CrossGLG: LLM Guides One-shot Skeleton-based 3D Action Recognition in a Cross-level Manner

    Authors: Tingbing Yan, Wenzheng Zeng, Yang Xiao, Xingyu Tong, Bo Tan, Zhiwen Fang, Zhiguo Cao, Joey Tianyi Zhou

    Abstract: Most existing one-shot skeleton-based action recognition focuses on raw low-level information (e.g., joint location), and may suffer from local information loss and low generalization ability. To alleviate these, we propose to leverage text description generated from large language models (LLM) that contain high-level human knowledge, to guide feature learning, in a global-local-global way. Partic… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  7. arXiv:2403.06075  [pdf, other

    cs.CV

    Multisize Dataset Condensation

    Authors: Yang He, Lingao Xiao, Joey Tianyi Zhou, Ivor Tsang

    Abstract: While dataset condensation effectively enhances training efficiency, its application in on-device scenarios brings unique challenges. 1) Due to the fluctuating computational resources of these devices, there's a demand for a flexible dataset size that diverges from a predefined size. 2) The limited computational power on devices often prevents additional condensation operations. These two challeng… ▽ More

    Submitted 14 April, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

    Comments: Accepted by ICLR 2024 Oral

  8. arXiv:2402.08384  [pdf, other

    cs.LG cs.AI

    Selective Learning: Towards Robust Calibration with Dynamic Regularization

    Authors: Zongbo Han, Yifeng Yang, Changqing Zhang, Linjun Zhang, Joey Tianyi Zhou, Qinghua Hu, Huaxiu Yao

    Abstract: Miscalibration in deep learning refers to there is a discrepancy between the predicted confidence and performance. This problem usually arises due to the overfitting problem, which is characterized by learning everything presented in the training set, resulting in overconfident predictions during testing. Existing methods typically address overfitting and mitigate the miscalibration by adding a ma… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

  9. arXiv:2402.04924  [pdf, other

    cs.LG

    Two Trades is not Baffled: Condensing Graph via Crafting Rational Gradient Matching

    Authors: Tianle Zhang, Yuchen Zhang, Kun Wang, Kai Wang, Beining Yang, Kaipeng Zhang, Wenqi Shao, ** Liu, Joey Tianyi Zhou, Yang You

    Abstract: Training on large-scale graphs has achieved remarkable results in graph representation learning, but its cost and storage have raised growing concerns. As one of the most promising directions, graph condensation methods address these issues by employing gradient matching, aiming to condense the full graph into a more concise yet information-rich synthetic set. Though encouraging, these strategies… ▽ More

    Submitted 30 June, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

    Comments: An effective method for graph condensation

  10. arXiv:2401.15902  [pdf, other

    cs.CV

    A Concise but High-performing Network for Image Guided Depth Completion in Autonomous Driving

    Authors: Moyun Liu, Bing Chen, You** Chen, **gming Xie, Lei Yao, Yang Zhang, Joey Tianyi Zhou

    Abstract: Depth completion is a crucial task in autonomous driving, aiming to convert a sparse depth map into a dense depth prediction. Due to its potentially rich semantic information, RGB image is commonly fused to enhance the completion effect. Image-guided depth completion involves three key challenges: 1) how to effectively fuse the two modalities; 2) how to better recover depth information; and 3) how… ▽ More

    Submitted 22 April, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

  11. arXiv:2401.08977  [pdf, ps, other

    cs.LG cs.AI

    FedLoGe: Joint Local and Generic Federated Learning under Long-tailed Data

    Authors: Zikai Xiao, Zihan Chen, Liyinglan Liu, Yang Feng, Jian Wu, Wanlu Liu, Joey Tianyi Zhou, Howard Hao Yang, Zuozhu Liu

    Abstract: Federated Long-Tailed Learning (Fed-LT), a paradigm wherein data collected from decentralized local clients manifests a globally prevalent long-tailed distribution, has garnered considerable attention in recent times. In the context of Fed-LT, existing works have predominantly centered on addressing the data imbalance issue to enhance the efficacy of the generic global model while neglecting the p… ▽ More

    Submitted 8 March, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

    Comments: Accepted by ICLR 2024, code: https://github.com/ZackZikaiXiao/FedLoGe

    ACM Class: I.2.0

  12. arXiv:2401.06826  [pdf, other

    cs.LG cs.AI cs.CV

    Direct Distillation between Different Domains

    Authors: Jialiang Tang, Shuo Chen, Gang Niu, Hongyuan Zhu, Joey Tianyi Zhou, Chen Gong, Masashi Sugiyama

    Abstract: Knowledge Distillation (KD) aims to learn a compact student network using knowledge from a large pre-trained teacher network, where both networks are trained on data from the same distribution. However, in practical applications, the student network may be required to perform in a new scenario (i.e., the target domain), which usually exhibits significant differences from the known scenario of the… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

  13. arXiv:2311.13613  [pdf, other

    cs.CV cs.LG

    Spanning Training Progress: Temporal Dual-Depth Scoring (TDDS) for Enhanced Dataset Pruning

    Authors: Xin Zhang, Jiawei Du, Yunsong Li, Weiying Xie, Joey Tianyi Zhou

    Abstract: Dataset pruning aims to construct a coreset capable of achieving performance comparable to the original, full dataset. Most existing dataset pruning methods rely on snapshot-based criteria to identify representative samples, often resulting in poor generalization across various pruning and cross-architecture scenarios. Recent studies have addressed this issue by expanding the scope of training dyn… ▽ More

    Submitted 28 May, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

    Comments: Accepted by CVPR2024

  14. arXiv:2311.13234  [pdf, other

    cs.CV cs.AI

    TSegFormer: 3D Tooth Segmentation in Intraoral Scans with Geometry Guided Transformer

    Authors: Huimin Xiong, Kunle Li, Kaiyuan Tan, Yang Feng, Joey Tianyi Zhou, ** Hao, Haochao Ying, Jian Wu, Zuozhu Liu

    Abstract: Optical Intraoral Scanners (IOS) are widely used in digital dentistry to provide detailed 3D information of dental crowns and the gingiva. Accurate 3D tooth segmentation in IOSs is critical for various dental applications, while previous methods are error-prone at complicated boundaries and exhibit unsatisfactory results across patients. In this paper, we propose TSegFormer which captures both loc… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

    Comments: MICCAI 2023, STAR(Student Travel) award. 11 pages, 3 figures, 5 tables. arXiv admin note: text overlap with arXiv:2210.16627

  15. arXiv:2311.01570  [pdf, other

    cs.CV cs.LG

    Sequential Subset Matching for Dataset Distillation

    Authors: Jiawei Du, Qin Shi, Joey Tianyi Zhou

    Abstract: Dataset distillation is a newly emerging task that synthesizes a small-size dataset used in training deep neural networks (DNNs) for reducing data storage and model training costs. The synthetic datasets are expected to capture the essence of the knowledge contained in real-world datasets such that the former yields a similar performance as the latter. Recent advancements in distillation methods h… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

  16. arXiv:2310.17468  [pdf, other

    cs.CV cs.LG

    Cross-modal Active Complementary Learning with Self-refining Correspondence

    Authors: Yang Qin, Yuan Sun, Dezhong Peng, Joey Tianyi Zhou, Xi Peng, Peng Hu

    Abstract: Recently, image-text matching has attracted more and more attention from academia and industry, which is fundamental to understanding the latent correspondence across visual and textual modalities. However, most existing methods implicitly assume the training pairs are well-aligned while ignoring the ubiquitous annotation noise, a.k.a noisy correspondence (NC), thereby inevitably leading to a perf… ▽ More

    Submitted 7 January, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

    Comments: This paper is accepted by NeurIPS 2023

  17. arXiv:2310.14019  [pdf, other

    cs.CV

    You Only Condense Once: Two Rules for Pruning Condensed Datasets

    Authors: Yang He, Lingao Xiao, Joey Tianyi Zhou

    Abstract: Dataset condensation is a crucial tool for enhancing training efficiency by reducing the size of the training dataset, particularly in on-device scenarios. However, these scenarios have two significant challenges: 1) the varying computational resources available on the devices require a dataset size different from the pre-defined condensed dataset, and 2) the limited computational resources often… ▽ More

    Submitted 21 October, 2023; originally announced October 2023.

    Comments: Accepted by NeurIPS 2023

  18. arXiv:2310.12560  [pdf, other

    cs.LG

    Fast Model Debias with Machine Unlearning

    Authors: Ruizhe Chen, Jianfei Yang, Huimin Xiong, Jianhong Bai, Tianxiang Hu, ** Hao, Yang Feng, Joey Tianyi Zhou, Jian Wu, Zuozhu Liu

    Abstract: Recent discoveries have revealed that deep neural networks might behave in a biased manner in many real-world scenarios. For instance, deep networks trained on a large-scale face recognition dataset CelebA tend to predict blonde hair for females and black hair for males. Such biases not only jeopardize the robustness of models but also perpetuate and amplify social biases, which is especially conc… ▽ More

    Submitted 3 November, 2023; v1 submitted 19 October, 2023; originally announced October 2023.

  19. arXiv:2310.07587  [pdf, other

    cs.LG cs.AI

    Fed-GraB: Federated Long-tailed Learning with Self-Adjusting Gradient Balancer

    Authors: Zikai Xiao, Zihan Chen, Songshang Liu, Hualiang Wang, Yang Feng, ** Hao, Joey Tianyi Zhou, Jian Wu, Howard Hao Yang, Zuozhu Liu

    Abstract: Data privacy and long-tailed distribution are the norms rather than the exception in many real-world tasks. This paper investigates a federated long-tailed learning (Fed-LT) task in which each client holds a locally heterogeneous dataset; if the datasets can be globally aggregated, they jointly exhibit a long-tailed distribution. Under such a setting, existing federated optimization and/or central… ▽ More

    Submitted 26 November, 2023; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: Accepted by NeurIPS 2023

    ACM Class: I.2.0

  20. arXiv:2310.01376  [pdf, other

    cs.CV

    Towards Distribution-Agnostic Generalized Category Discovery

    Authors: Jianhong Bai, Zuozhu Liu, Hualiang Wang, Ruizhe Chen, Lianrui Mu, Xiaomeng Li, Joey Tianyi Zhou, Yang Feng, Jian Wu, Haoji Hu

    Abstract: Data imbalance and open-ended distribution are two intrinsic characteristics of the real visual world. Though encouraging progress has been made in tackling each challenge separately, few works dedicated to combining them towards real-world scenarios. While several previous works have focused on classifying close-set samples and detecting open-set samples during testing, it's still essential to be… ▽ More

    Submitted 20 February, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

    Comments: Accepted at NeurIPS 2023

  21. arXiv:2309.09724  [pdf, other

    cs.CV

    Robust Geometry-Preserving Depth Estimation Using Differentiable Rendering

    Authors: Chi Zhang, Wei Yin, Gang Yu, Zhibin Wang, Tao Chen, Bin Fu, Joey Tianyi Zhou, Chunhua Shen

    Abstract: In this study, we address the challenge of 3D scene structure recovery from monocular depth estimation. While traditional depth estimation methods leverage labeled datasets to directly predict absolute depth, recent advancements advocate for mix-dataset training, enhancing generalization across diverse scenes. However, such mixed dataset training yields depth predictions only up to an unknown scal… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

    Comments: Accepted by ICCV2023

  22. arXiv:2308.16763  [pdf, other

    cs.CL cs.AI

    Ladder-of-Thought: Using Knowledge as Steps to Elevate Stance Detection

    Authors: Kairui Hu, Ming Yan, Joey Tianyi Zhou, Ivor W. Tsang, Wen Haw Chong, Yong Keong Yap

    Abstract: Stance detection aims to identify the attitude expressed in a document towards a given target. Techniques such as Chain-of-Thought (CoT) prompting have advanced this task, enhancing a model's reasoning capabilities through the derivation of intermediate rationales. However, CoT relies primarily on a model's pre-trained internal knowledge during reasoning, thereby neglecting the valuable external i… ▽ More

    Submitted 7 September, 2023; v1 submitted 31 August, 2023; originally announced August 2023.

    Comments: 5 pages, 2 figures, 2 tables

  23. arXiv:2308.09911  [pdf, other

    cs.CV cs.MM

    Noisy-Correspondence Learning for Text-to-Image Person Re-identification

    Authors: Yang Qin, Yingke Chen, Dezhong Peng, Xi Peng, Joey Tianyi Zhou, Peng Hu

    Abstract: Text-to-image person re-identification (TIReID) is a compelling topic in the cross-modal community, which aims to retrieve the target person based on a textual query. Although numerous TIReID methods have been proposed and achieved promising performance, they implicitly assume the training image-text pairs are correctly aligned, which is not always the case in real-world scenarios. In practice, th… ▽ More

    Submitted 28 March, 2024; v1 submitted 19 August, 2023; originally announced August 2023.

  24. arXiv:2307.10875  [pdf, other

    cs.CV cs.CR cs.LG

    Risk-optimized Outlier Removal for Robust 3D Point Cloud Classification

    Authors: Xinke Li, Junchi Lu, Henghui Ding, Changsheng Sun, Joey Tianyi Zhou, Chee Yeow Meng

    Abstract: With the growth of 3D sensing technology, deep learning system for 3D point clouds has become increasingly important, especially in applications like autonomous vehicles where safety is a primary concern. However, there are also growing concerns about the reliability of these systems when they encounter noisy point clouds, whether occurring naturally or introduced with malicious intent. This paper… ▽ More

    Submitted 1 January, 2024; v1 submitted 20 July, 2023; originally announced July 2023.

  25. arXiv:2306.02050  [pdf, other

    cs.LG cs.CV

    Provable Dynamic Fusion for Low-Quality Multimodal Data

    Authors: Qingyang Zhang, Haitao Wu, Changqing Zhang, Qinghua Hu, Huazhu Fu, Joey Tianyi Zhou, Xi Peng

    Abstract: The inherent challenge of multimodal fusion is to precisely capture the cross-modal correlation and flexibly conduct cross-modal interaction. To fully release the value of each modality and mitigate the influence of low-quality multimodal data, dynamic multimodal fusion emerges as a promising learning paradigm. Despite its widespread use, theoretical justifications in this field are still notably… ▽ More

    Submitted 6 June, 2023; v1 submitted 3 June, 2023; originally announced June 2023.

    Comments: Accepted by ICML 2023

  26. arXiv:2306.01452  [pdf, other

    cs.CV

    dugMatting: Decomposed-Uncertainty-Guided Matting

    Authors: Jiawei Wu, Changqing Zhang, Zuoyong Li, Huazhu Fu, Xi Peng, Joey Tianyi Zhou

    Abstract: Cutting out an object and estimating its opacity mask, known as image matting, is a key task in image and video editing. Due to the highly ill-posed issue, additional inputs, typically user-defined trimaps or scribbles, are usually needed to reduce the uncertainty. Although effective, it is either time consuming or only suitable for experienced users who know where to place the strokes. In this wo… ▽ More

    Submitted 2 June, 2023; originally announced June 2023.

  27. arXiv:2306.01265  [pdf, other

    cs.LG

    Calibrating Multimodal Learning

    Authors: Huan Ma. Qingyang Zhang, Changqing Zhang, Bingzhe Wu, Huazhu Fu, Joey Tianyi Zhou, Qinghua Hu

    Abstract: Multimodal machine learning has achieved remarkable progress in a wide range of scenarios. However, the reliability of multimodal learning remains largely unexplored. In this paper, through extensive empirical studies, we identify current multimodal classification methods suffer from unreliable predictive confidence that tend to rely on partial modalities when estimating confidence. Specifically,… ▽ More

    Submitted 2 June, 2023; originally announced June 2023.

  28. arXiv:2304.09347  [pdf, ps, other

    cs.CV

    Dual Stage Stylization Modulation for Domain Generalized Semantic Segmentation

    Authors: Gabriel Tjio, ** Liu, Chee-Keong Kwoh, Joey Tianyi Zhou

    Abstract: Obtaining sufficient labeled data for training deep models is often challenging in real-life applications. To address this issue, we propose a novel solution for single-source domain generalized semantic segmentation. Recent approaches have explored data diversity enhancement using hallucination techniques. However, excessive hallucination can degrade performance, particularly for imbalanced datas… ▽ More

    Submitted 3 August, 2023; v1 submitted 18 April, 2023; originally announced April 2023.

  29. arXiv:2304.03635  [pdf, other

    cs.CV

    A2J-Transformer: Anchor-to-Joint Transformer Network for 3D Interacting Hand Pose Estimation from a Single RGB Image

    Authors: Changlong Jiang, Yang Xiao, Cunlin Wu, Mingyang Zhang, **ghong Zheng, Zhiguo Cao, Joey Tianyi Zhou

    Abstract: 3D interacting hand pose estimation from a single RGB image is a challenging task, due to serious self-occlusion and inter-occlusion towards hands, confusing similar appearance patterns between 2 hands, ill-posed joint position map** from 2D to 3D, etc.. To address these, we propose to extend A2J-the state-of-the-art depth-based 3D single hand pose estimation method-to RGB domain under interacti… ▽ More

    Submitted 7 April, 2023; originally announced April 2023.

    Comments: CVPR 2023. The code is avaliable at https://github.com/ChanglongJiangGit/A2J-Transformer

  30. arXiv:2303.16053  [pdf, other

    cs.CV

    Real-time Multi-person Eyeblink Detection in the Wild for Untrimmed Video

    Authors: Wenzheng Zeng, Yang Xiao, Sicheng Wei, **fang Gan, Xintao Zhang, Zhiguo Cao, Zhiwen Fang, Joey Tianyi Zhou

    Abstract: Real-time eyeblink detection in the wild can widely serve for fatigue detection, face anti-spoofing, emotion analysis, etc. The existing research efforts generally focus on single-person cases towards trimmed video. However, multi-person scenario within untrimmed videos is also important for practical applications, which has not been well concerned yet. To address this, we shed light on this resea… ▽ More

    Submitted 21 August, 2023; v1 submitted 28 March, 2023; originally announced March 2023.

    Comments: Accepted by CVPR 2023

  31. arXiv:2212.07070  [pdf, other

    cs.CV cs.LG

    Deep Negative Correlation Classification

    Authors: Le Zhang, Qibin Hou, Yun Liu, Jia-Wang Bian, Xun Xu, Joey Tianyi Zhou, Ce Zhu

    Abstract: Ensemble learning serves as a straightforward way to improve the performance of almost any machine learning algorithm. Existing deep ensemble methods usually naively train many different models and then aggregate their predictions. This is not optimal in our view from two aspects: i) Naively training multiple models adds much more computational burden, especially in the deep learning era; ii) Pure… ▽ More

    Submitted 14 December, 2022; originally announced December 2022.

  32. arXiv:2211.11004  [pdf, other

    cs.LG cs.AI cs.CV

    Minimizing the Accumulated Trajectory Error to Improve Dataset Distillation

    Authors: Jiawei Du, Yidi Jiang, Vincent Y. F. Tan, Joey Tianyi Zhou, Haizhou Li

    Abstract: Model-based deep learning has achieved astounding successes due in part to the availability of large-scale real-world data. However, processing such massive amounts of data comes at a considerable cost in terms of computations, storage, training and the search for good neural architectures. Dataset distillation has thus recently come to the fore. This paradigm involves distilling information from… ▽ More

    Submitted 25 March, 2023; v1 submitted 20 November, 2022; originally announced November 2022.

  33. arXiv:2210.16627  [pdf, other

    cs.CV

    TFormer: 3D Tooth Segmentation in Mesh Scans with Geometry Guided Transformer

    Authors: Huimin Xiong, Kunle Li, Kaiyuan Tan, Yang Feng, Joey Tianyi Zhou, ** Hao, Zuozhu Liu

    Abstract: Optical Intra-oral Scanners (IOS) are widely used in digital dentistry, providing 3-Dimensional (3D) and high-resolution geometrical information of dental crowns and the gingiva. Accurate 3D tooth segmentation, which aims to precisely delineate the tooth and gingiva instances in IOS, plays a critical role in a variety of dental applications. However, segmentation performance of previous methods ar… ▽ More

    Submitted 29 October, 2022; originally announced October 2022.

  34. arXiv:2209.14851  [pdf, other

    cs.LG cs.CV

    Meta Knowledge Condensation for Federated Learning

    Authors: ** Liu, Xin Yu, Joey Tianyi Zhou

    Abstract: Existing federated learning paradigms usually extensively exchange distributed models at a central solver to achieve a more powerful model. However, this would incur severe communication burden between a server and multiple clients especially when data distributions are heterogeneous. As a result, current federated learning methods often require a large number of communication rounds in training.… ▽ More

    Submitted 29 September, 2022; originally announced September 2022.

  35. arXiv:2205.14083  [pdf, other

    cs.LG cs.AI cs.CV

    Sharpness-Aware Training for Free

    Authors: Jiawei Du, Daquan Zhou, Jiashi Feng, Vincent Y. F. Tan, Joey Tianyi Zhou

    Abstract: Modern deep neural networks (DNNs) have achieved state-of-the-art performances but are typically over-parameterized. The over-parameterization may result in undesirably large generalization error in the absence of other customized training strategies. Recently, a line of research under the name of Sharpness-Aware Minimization (SAM) has shown that minimizing a sharpness measure, which reflects the… ▽ More

    Submitted 2 March, 2023; v1 submitted 27 May, 2022; originally announced May 2022.

  36. arXiv:2204.11423  [pdf, other

    cs.LG

    Trusted Multi-View Classification with Dynamic Evidential Fusion

    Authors: Zongbo Han, Changqing Zhang, Huazhu Fu, Joey Tianyi Zhou

    Abstract: Existing multi-view classification algorithms focus on promoting accuracy by exploiting different views, typically integrating them into common representations for follow-up tasks. Although effective, it is also crucial to ensure the reliability of both the multi-view integration and the final decision, especially for noisy, corrupted and out-of-distribution data. Dynamically assessing the trustwo… ▽ More

    Submitted 25 June, 2022; v1 submitted 24 April, 2022; originally announced April 2022.

    Comments: Journal version of arXiv:2102.02051. Accepted by IEEE TPAMI

  37. arXiv:2204.09597  [pdf, other

    cs.CL cs.AI

    Perceiving the World: Question-guided Reinforcement Learning for Text-based Games

    Authors: Yunqiu Xu, Meng Fang, Ling Chen, Yali Du, Joey Tianyi Zhou, Chengqi Zhang

    Abstract: Text-based games provide an interactive way to study natural language processing. While deep reinforcement learning has shown effectiveness in develo** the game playing agent, the low sample efficiency and the large action space remain to be the two major challenges that hinder the DRL from being applied in the real world. In this paper, we address the challenges by introducing world-perceiving… ▽ More

    Submitted 21 April, 2022; v1 submitted 20 March, 2022; originally announced April 2022.

    Comments: ACL2022, fix some typos

  38. arXiv:2203.04313  [pdf, other

    eess.IV cs.CV

    Multi-Scale Adaptive Network for Single Image Denoising

    Authors: Yuanbiao Gou, Peng Hu, Jiancheng Lv, Joey Tianyi Zhou, Xi Peng

    Abstract: Multi-scale architectures have shown effectiveness in a variety of tasks thanks to appealing cross-scale complementarity. However, existing architectures treat different scale features equally without considering the scale-specific characteristics, \textit{i.e.}, the within-scale characteristics are ignored in the architecture design. In this paper, we reveal this missing piece for multi-scale arc… ▽ More

    Submitted 29 October, 2022; v1 submitted 8 March, 2022; originally announced March 2022.

    Journal ref: the Thirty-Sixth Annual Conference on Neural Information Processing Systems (NeurIPS 2022)

  39. arXiv:2201.08071  [pdf, other

    cs.CV cs.AI cs.CL cs.MM

    Temporal Sentence Grounding in Videos: A Survey and Future Directions

    Authors: Hao Zhang, Aixin Sun, Wei **g, Joey Tianyi Zhou

    Abstract: Temporal sentence grounding in videos (TSGV), \aka natural language video localization (NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that semantically corresponds to a language query from an untrimmed video. Connecting computer vision and natural language, TSGV has drawn significant attention from researchers in both communities. This survey attempts to provide a summa… ▽ More

    Submitted 13 March, 2023; v1 submitted 20 January, 2022; originally announced January 2022.

    Comments: Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

  40. arXiv:2111.08456  [pdf, other

    cs.LG cs.AI

    Trustworthy Multimodal Regression with Mixture of Normal-inverse Gamma Distributions

    Authors: Huan Ma, Zongbo Han, Changqing Zhang, Huazhu Fu, Joey Tianyi Zhou, Qinghua Hu

    Abstract: Multimodal regression is a fundamental task, which integrates the information from different sources to improve the performance of follow-up applications. However, existing methods mainly focus on improving the performance and often ignore the confidence of prediction for diverse situations. In this study, we are devoted to trustworthy multimodal regression which is critical in cost-sensitive doma… ▽ More

    Submitted 11 November, 2021; originally announced November 2021.

    Comments: Accepted to NeurIPS 2021

  41. arXiv:2111.04321  [pdf, other

    cs.CV cs.CL

    Towards Debiasing Temporal Sentence Grounding in Video

    Authors: Hao Zhang, Aixin Sun, Wei **g, Joey Tianyi Zhou

    Abstract: The temporal sentence grounding in video (TSGV) task is to locate a temporal moment from an untrimmed video, to match a language query, i.e., a sentence. Without considering bias in moment annotations (e.g., start and end positions in a video), many models tend to capture statistical regularities of the moment annotations, and do not well learn cross-modal reasoning between video and language quer… ▽ More

    Submitted 8 November, 2021; originally announced November 2021.

    Comments: 13 pages, 6 figures, 11 tables

  42. arXiv:2110.03141  [pdf, other

    cs.AI cs.CV cs.LG

    Efficient Sharpness-aware Minimization for Improved Training of Neural Networks

    Authors: Jiawei Du, Hanshu Yan, Jiashi Feng, Joey Tianyi Zhou, Liangli Zhen, Rick Siow Mong Goh, Vincent Y. F. Tan

    Abstract: Overparametrized Deep Neural Networks (DNNs) often achieve astounding performances, but may potentially result in severe generalization error. Recently, the relation between the sharpness of the loss landscape and the generalization error has been established by Foret et al. (2020), in which the Sharpness Aware Minimizer (SAM) was proposed to mitigate the degradation of the generalization. Unfortu… ▽ More

    Submitted 28 May, 2022; v1 submitted 6 October, 2021; originally announced October 2021.

  43. arXiv:2108.00205  [pdf, other

    cs.CV cs.AI cs.CL

    Word2Pix: Word to Pixel Cross Attention Transformer in Visual Grounding

    Authors: Heng Zhao, Joey Tianyi Zhou, Yew-Soon Ong

    Abstract: Current one-stage methods for visual grounding encode the language query as one holistic sentence embedding before fusion with visual feature. Such a formulation does not treat each word of a query sentence on par when modeling language to visual attention, therefore prone to neglect words which are less important for sentence embedding but critical for visual grounding. In this paper we propose W… ▽ More

    Submitted 31 July, 2021; originally announced August 2021.

  44. arXiv:2106.10705  [pdf, other

    cs.CV

    Automated Deepfake Detection

    Authors: ** Liu, Yuewei Lin, Yang He, Yunchao Wei, Liangli Zhen, Joey Tianyi Zhou, Rick Siow Mong Goh, **gen Liu

    Abstract: In this paper, we propose to utilize Automated Machine Learning to adaptively search a neural architecture for deepfake detection. This is the first time to employ automated machine learning for deepfake detection. Based on our explored search space, our proposed method achieves competitive prediction accuracy compared to previous methods. To improve the generalizability of our method, especially… ▽ More

    Submitted 12 August, 2021; v1 submitted 20 June, 2021; originally announced June 2021.

  45. arXiv:2106.04144  [pdf, ps, other

    cs.CV

    Adversarial Semantic Hallucination for Domain Generalized Semantic Segmentation

    Authors: Gabriel Tjio, ** Liu, Joey Tianyi Zhou, Rick Siow Mong Goh

    Abstract: Convolutional neural networks typically perform poorly when the test (target domain) and training (source domain) data have significantly different distributions. While this problem can be mitigated by using the target domain data to align the source and target domain feature representations, the target domain data may be unavailable due to privacy concerns. Consequently, there is a need for metho… ▽ More

    Submitted 26 October, 2021; v1 submitted 8 June, 2021; originally announced June 2021.

    Comments: Accepted in WACV 2022

  46. Parallel Attention Network with Sequence Matching for Video Grounding

    Authors: Hao Zhang, Aixin Sun, Wei **g, Liangli Zhen, Joey Tianyi Zhou, Rick Siow Mong Goh

    Abstract: Given a video, video grounding aims to retrieve a temporal moment that semantically corresponds to a language query. In this work, we propose a Parallel Attention Network with Sequence matching (SeqPAN) to address the challenges in this task: multi-modal representation learning, and target moment boundary prediction. We design a self-guided parallel attention module to effectively capture self-mod… ▽ More

    Submitted 18 May, 2021; originally announced May 2021.

    Comments: 15 pages, 10 figures, 7 tables, Findings at ACL 2021

  47. arXiv:2105.06943  [pdf, other

    cs.NE

    Efficient Spiking Neural Networks with Radix Encoding

    Authors: Zhehui Wang, Xiaozhe Gu, Rick Goh, Joey Tianyi Zhou, Tao Luo

    Abstract: Spiking neural networks (SNNs) have advantages in latency and energy efficiency over traditional artificial neural networks (ANNs) due to its event-driven computation mechanism and replacement of energy-consuming weight multiplications with additions. However, in order to reach accuracy of its ANN counterpart, it usually requires long spike trains to ensure the accuracy. Traditionally, a spike tra… ▽ More

    Submitted 2 November, 2023; v1 submitted 14 May, 2021; originally announced May 2021.

  48. arXiv:2105.06247  [pdf, other

    cs.CL cs.CV cs.IR

    Video Corpus Moment Retrieval with Contrastive Learning

    Authors: Hao Zhang, Aixin Sun, Wei **g, Guoshun Nan, Liangli Zhen, Joey Tianyi Zhou, Rick Siow Mong Goh

    Abstract: Given a collection of untrimmed and unsegmented videos, video corpus moment retrieval (VCMR) is to retrieve a temporal moment (i.e., a fraction of a video) that semantically corresponds to a given text query. As video and text are from two distinct feature spaces, there are two general approaches to address VCMR: (i) to separately encode each modality representations, then align the two modality r… ▽ More

    Submitted 13 May, 2021; originally announced May 2021.

    Comments: 11 pages, 7 figures and 6 tables. Accepted by SIGIR 2021

  49. arXiv:2103.16074  [pdf, other

    cs.LG cs.CR cs.CV

    PointBA: Towards Backdoor Attacks in 3D Point Cloud

    Authors: Xinke Li, Zhirui Chen, Yue Zhao, Zekun Tong, Yabang Zhao, Andrew Lim, Joey Tianyi Zhou

    Abstract: 3D deep learning has been increasingly more popular for a variety of tasks including many safety-critical applications. However, recently several works raise the security issues of 3D deep models. Although most of them consider adversarial attacks, we identify that backdoor attack is indeed a more serious threat to 3D deep learning systems but remains unexplored. We present the backdoor attacks in… ▽ More

    Submitted 22 August, 2021; v1 submitted 30 March, 2021; originally announced March 2021.

    Comments: Accepted by ICCV 2021

  50. arXiv:2103.14493  [pdf, other

    cs.LG cs.NE

    RCT: Resource Constrained Training for Edge AI

    Authors: Tian Huang, Tao Luo, Ming Yan, Joey Tianyi Zhou, Rick Goh

    Abstract: Neural networks training on edge terminals is essential for edge AI computing, which needs to be adaptive to evolving environment. Quantised models can efficiently run on edge devices, but existing training methods for these compact models are designed to run on powerful servers with abundant memory and energy budget. For example, quantisation-aware training (QAT) method involves two copies of mod… ▽ More

    Submitted 26 March, 2021; originally announced March 2021.

    Comments: 14 pages

    MSC Class: 68T07 (Primary) 68T05 (Secondary) ACM Class: I.5.1; I.2.6