Skip to main content

Showing 101–150 of 885 results for author: Tao, D

.
  1. arXiv:2311.17957  [pdf, other

    cs.CV

    HandRefiner: Refining Malformed Hands in Generated Images by Diffusion-based Conditional Inpainting

    Authors: Wenquan Lu, Yufei Xu, **g Zhang, Chaoyue Wang, Dacheng Tao

    Abstract: Diffusion models have achieved remarkable success in generating realistic images but suffer from generating accurate human hands, such as incorrect finger counts or irregular shapes. This difficulty arises from the complex task of learning the physical structure and pose of hands from training images, which involves extensive deformations and occlusions. For correct hand generation, our paper intr… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

  2. arXiv:2311.16714  [pdf, other

    cs.CV

    Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld

    Authors: Yijun Yang, Tianyi Zhou, Kanxue Li, Dapeng Tao, Lusong Li, Li Shen, Xiaodong He, **g Jiang, Yuhui Shi

    Abstract: While large language models (LLMs) excel in a simulated world of texts, they struggle to interact with the more realistic world without perceptions of other modalities such as visual or audio signals. Although vision-language models (VLMs) integrate LLM modules (1) aligned with static image features, and (2) may possess prior knowledge of world dynamics (as demonstrated in the text world), they ha… ▽ More

    Submitted 29 March, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

    Comments: Accepted by CVPR 2024

  3. arXiv:2311.15744  [pdf, other

    cs.CV

    One More Step: A Versatile Plug-and-Play Module for Rectifying Diffusion Schedule Flaws and Enhancing Low-Frequency Controls

    Authors: Minghui Hu, Jianbin Zheng, Chuanxia Zheng, Chaoyue Wang, Dacheng Tao, Tat-Jen Cham

    Abstract: It is well known that many open-released foundational diffusion models have difficulty in generating images that substantially depart from average brightness, despite such images being present in the training data. This is due to an inconsistency: while denoising starts from pure Gaussian noise during inference, the training noise schedule retains residual data even in the final timestep distribut… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: Project Page: https://jabir-zheng.github.io/OneMoreStep/, Demo Page: https://huggingface.co/spaces/h1t/oms_sdxl_lcm

  4. arXiv:2311.15200  [pdf, other

    cs.CV cs.LG

    SpliceMix: A Cross-scale and Semantic Blending Augmentation Strategy for Multi-label Image Classification

    Authors: Lei Wang, Yibing Zhan, Leilei Ma, Dapeng Tao, Liang Ding, Chen Gong

    Abstract: Recently, Mix-style data augmentation methods (e.g., Mixup and CutMix) have shown promising performance in various visual tasks. However, these methods are primarily designed for single-label images, ignoring the considerable discrepancies between single- and multi-label images, i.e., a multi-label image involves multiple co-occurred categories and fickle object scales. On the other hand, previous… ▽ More

    Submitted 26 November, 2023; originally announced November 2023.

    Comments: 13 pages, 10 figures

  5. arXiv:2311.14756  [pdf, other

    cs.LG cs.AI

    Task-Distributionally Robust Data-Free Meta-Learning

    Authors: Zixuan Hu, Li Shen, Zhenyi Wang, Yongxian Wei, Baoyuan Wu, Chun Yuan, Dacheng Tao

    Abstract: Data-Free Meta-Learning (DFML) aims to efficiently learn new tasks by leveraging multiple pre-trained models without requiring their original training data. Existing inversion-based DFML methods construct pseudo tasks from a learnable dataset, which is inversely generated from the pre-trained model pool. For the first time, we reveal two major challenges hindering their practical deployments: Task… ▽ More

    Submitted 23 November, 2023; originally announced November 2023.

  6. arXiv:2311.13254  [pdf, other

    cs.CV cs.AI eess.IV

    DA-STC: Domain Adaptive Video Semantic Segmentation via Spatio-Temporal Consistency

    Authors: Zhe Zhang, Gaochang Wu, **g Zhang, Chunhua Shen, Dacheng Tao, Tianyou Chai

    Abstract: Video semantic segmentation is a pivotal aspect of video representation learning. However, significant domain shifts present a challenge in effectively learning invariant spatio-temporal features across the labeled source domain and unlabeled target domain for video semantic segmentation. To solve the challenge, we propose a novel DA-STC method for domain adaptive video semantic segmentation, whic… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

    Comments: 18 pages,9 figures

  7. arXiv:2311.07203  [pdf, other

    quant-ph cs.AI physics.optics

    Optical Quantum Sensing for Agnostic Environments via Deep Learning

    Authors: Zeqiao Zhou, Yuxuan Du, Xu-Fei Yin, Shanshan Zhao, Xinmei Tian, Dacheng Tao

    Abstract: Optical quantum sensing promises measurement precision beyond classical sensors termed the Heisenberg limit (HL). However, conventional methodologies often rely on prior knowledge of the target system to achieve HL, presenting challenges in practical applications. Addressing this limitation, we introduce an innovative Deep Learning-based Quantum Sensing scheme (DQS), enabling optical quantum senso… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

  8. arXiv:2311.05782  [pdf, other

    cs.DC

    MPGemmFI: A Fault Injection Technique for Mixed Precision GEMM in ML Applications

    Authors: Bo Fang, Xinyi Li, Harvey Dam, Cheng Tan, Siva Kumar Sastry Hari, Timothy Tsai, Ignacio Laguna, Dingwen Tao, Ganesh Gopalakrishnan, Prashant Nair, Kevin Barker, Ang Li

    Abstract: Emerging deep learning workloads urgently need fast general matrix multiplication (GEMM). To meet such demand, one of the critical features of machine-learning-specific accelerators such as NVIDIA Tensor Cores, AMD Matrix Cores, and Google TPUs is the support of mixed-precision enabled GEMM. For DNN models, lower-precision FP data formats and computation offer acceptable correctness but significan… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

  9. arXiv:2311.03713  [pdf, other

    quant-ph cs.CV cs.LG

    Multimodal deep representation learning for quantum cross-platform verification

    Authors: Yang Qian, Yuxuan Du, Zhenliang He, Min-hsiu Hsieh, Dacheng Tao

    Abstract: Cross-platform verification, a critical undertaking in the realm of early-stage quantum computing, endeavors to characterize the similarity of two imperfect quantum devices executing identical algorithms, utilizing minimal measurements. While the random measurement approach has been instrumental in this context, the quasi-exponential computational demand with increasing qubit count hurdles its fea… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

  10. arXiv:2310.20369  [pdf, other

    cs.LG math.OC

    Stability and Generalization of the Decentralized Stochastic Gradient Descent Ascent Algorithm

    Authors: Miaoxi Zhu, Li Shen, Bo Du, Dacheng Tao

    Abstract: The growing size of available data has attracted increasing interest in solving minimax problems in a decentralized manner for various machine learning tasks. Previous theoretical research has primarily focused on the convergence rate and communication complexity of decentralized minimax algorithms, with little attention given to their generalization. In this paper, we investigate the primal-dual… ▽ More

    Submitted 31 October, 2023; originally announced October 2023.

    Comments: NeurIPS 2023

  11. arXiv:2310.19142  [pdf, other

    cs.LG

    MAG-GNN: Reinforcement Learning Boosted Graph Neural Network

    Authors: Lecheng Kong, Jiarui Feng, Hao Liu, Dacheng Tao, Yixin Chen, Muhan Zhang

    Abstract: While Graph Neural Networks (GNNs) recently became powerful tools in graph learning tasks, considerable efforts have been spent on improving GNNs' structural encoding ability. A particular line of work proposed subgraph GNNs that use subgraph information to improve GNNs' expressivity and achieved great success. However, such effectivity sacrifices the efficiency of GNNs by enumerating all possible… ▽ More

    Submitted 29 October, 2023; originally announced October 2023.

    Comments: Accepted to NeurIPS 2023

  12. arXiv:2310.14181  [pdf, other

    eess.AS

    A Study on Prosodic Entrainment in Relation to Therapist Empathy in Counseling Conversation

    Authors: Dehua Tao, Tan Lee, Harold Chui, Sarah Luk

    Abstract: Counseling is carried out as spoken conversation between a therapist and a client. The empathy level expressed by the therapist is considered an important index of the quality of counseling and often assessed by an observer or the client. This research investigates the entrainment of speech prosody in relation to subjectively rated empathy. Experimental results show that the entrainment of intensi… ▽ More

    Submitted 22 October, 2023; originally announced October 2023.

    Comments: Accepted by INTERSPEECH 2023

  13. arXiv:2310.14178  [pdf, other

    eess.AS

    Modeling Intrapersonal and Interpersonal Influences for Automatic Estimation of Therapist Empathy in Counseling Conversation

    Authors: Dehua Tao, Tan Lee, Harold Chui, Sarah Luk

    Abstract: Counseling is usually conducted through spoken conversation between a therapist and a client. The empathy level of therapist is a key indicator of outcomes. Presuming that therapist's empathy expression is shaped by their past behavior and their perception of the client's behavior, we propose a model to estimate the therapist empathy by considering both intrapersonal and interpersonal influences.… ▽ More

    Submitted 22 October, 2023; originally announced October 2023.

  14. arXiv:2310.13315  [pdf, other

    cs.CL

    Zero-Shot Sharpness-Aware Quantization for Pre-trained Language Models

    Authors: Miaoxi Zhu, Qihuang Zhong, Li Shen, Liang Ding, Juhua Liu, Bo Du, Dacheng Tao

    Abstract: Quantization is a promising approach for reducing memory overhead and accelerating inference, especially in large pre-trained language model (PLM) scenarios. While having no access to original training data due to security and privacy concerns has emerged the demand for zero-shot quantization. Most of the cutting-edge zero-shot quantization methods primarily 1) apply to computer vision tasks, and… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Comments: Accepted to EMNLP2023 (Main). Miaoxi Zhu and Qihuang Zhong contribute equally to this work

  15. arXiv:2310.11866  [pdf, ps, other

    cs.LG

    Stochastic Optimization for Non-convex Problem with Inexact Hessian Matrix, Gradient, and Function

    Authors: Liu Liu, Xuanqing Liu, Cho-Jui Hsieh, Dacheng Tao

    Abstract: Trust-region (TR) and adaptive regularization using cubics (ARC) have proven to have some very appealing theoretical properties for non-convex optimization by concurrently computing function value, gradient, and Hessian matrix to obtain the next search direction and the adjusted parameters. Although stochastic approximations help largely reduce the computational cost, it is challenging to theoreti… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

    Comments: arXiv admin note: text overlap with arXiv:1809.09853

  16. arXiv:2310.09832  [pdf, other

    cs.CL

    Merging Experts into One: Improving Computational Efficiency of Mixture of Experts

    Authors: Shwai He, Run-Ze Fan, Liang Ding, Li Shen, Tianyi Zhou, Dacheng Tao

    Abstract: Scaling the size of language models usually leads to remarkable advancements in NLP tasks. But it often comes with a price of growing computational cost. Although a sparse Mixture of Experts (MoE) can reduce the cost by activating a small subset of parameters (e.g., one expert) for each input, its computation escalates significantly if increasing the number of activated experts, limiting its pract… ▽ More

    Submitted 21 November, 2023; v1 submitted 15 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023 Main Conference (Oral)

  17. arXiv:2310.09762  [pdf, other

    cs.CL cs.AI

    Diversifying the Mixture-of-Experts Representation for Language Models with Orthogonal Optimizer

    Authors: Boan Liu, Liang Ding, Li Shen, Keqin Peng, Yu Cao, Dazhao Cheng, Dacheng Tao

    Abstract: The Mixture of Experts (MoE) has emerged as a highly successful technique in deep learning, based on the principle of divide-and-conquer to maximize model capacity without significant additional computational cost. Even in the era of large-scale language models (LLMs), MoE continues to play a crucial role, as some researchers have indicated that GPT-4 adopts the MoE structure to ensure diverse inf… ▽ More

    Submitted 15 October, 2023; originally announced October 2023.

  18. arXiv:2310.08184  [pdf, other

    cs.AI cs.LG

    Learn From Model Beyond Fine-Tuning: A Survey

    Authors: Hongling Zheng, Li Shen, Anke Tang, Yong Luo, Han Hu, Bo Du, Dacheng Tao

    Abstract: Foundation models (FM) have demonstrated remarkable performance across a wide range of tasks (especially in the fields of natural language processing and computer vision), primarily attributed to their ability to comprehend instructions and access extensive, high-quality data. This not only showcases their current effectiveness but also sets a promising trajectory towards the development of artifi… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

    Comments: 20 pages, 9 figures

  19. arXiv:2310.07743  [pdf, other

    cs.CV

    PointHR: Exploring High-Resolution Architectures for 3D Point Cloud Segmentation

    Authors: Haibo Qiu, Baosheng Yu, Yixin Chen, Dacheng Tao

    Abstract: Significant progress has been made recently in point cloud segmentation utilizing an encoder-decoder framework, which initially encodes point clouds into low-resolution representations and subsequently decodes high-resolution predictions. Inspired by the success of high-resolution architectures in image dense prediction, which always maintains a high-resolution representation throughout the entire… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Comments: Code is available at \url{https://github.com/haibo-qiu/PointHR}

  20. arXiv:2310.07418  [pdf, other

    cs.LG cs.AI

    Revisiting Plasticity in Visual Reinforcement Learning: Data, Modules and Training Stages

    Authors: Guozheng Ma, Lu Li, Sen Zhang, Zixuan Liu, Zhen Wang, Yixin Chen, Li Shen, Xueqian Wang, Dacheng Tao

    Abstract: Plasticity, the ability of a neural network to evolve with new data, is crucial for high-performance and sample-efficient visual reinforcement learning (VRL). Although methods like resetting and regularization can potentially mitigate plasticity loss, the influences of various components within the VRL framework on the agent's plasticity are still poorly understood. In this work, we conduct a syst… ▽ More

    Submitted 19 May, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: ICLR 2024 poster

  21. arXiv:2310.04742  [pdf, other

    cs.LG

    Parameter Efficient Multi-task Model Fusion with Partial Linearization

    Authors: Anke Tang, Li Shen, Yong Luo, Yibing Zhan, Han Hu, Bo Du, Yixin Chen, Dacheng Tao

    Abstract: Large pre-trained models have enabled significant advances in machine learning and served as foundation components. Model fusion methods, such as task arithmetic, have been proven to be powerful and scalable to incorporate fine-tuned weights from different tasks into a multi-task model. However, efficiently fine-tuning large pre-trained models on multiple downstream tasks remains challenging, lead… ▽ More

    Submitted 11 March, 2024; v1 submitted 7 October, 2023; originally announced October 2023.

  22. arXiv:2310.03461  [pdf, other

    cs.LG math.OC

    Which mode is better for federated learning? Centralized or Decentralized

    Authors: Yan Sun, Li Shen, Dacheng Tao

    Abstract: Both centralized and decentralized approaches have shown excellent performance and great application value in federated learning (FL). However, current studies do not provide sufficient evidence to show which one performs better. Although from the optimization perspective, decentralized methods can approach the comparable convergence of centralized methods with less communication, its test perform… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

  23. arXiv:2310.03123  [pdf, other

    cs.LG cs.AI

    Efficient Federated Prompt Tuning for Black-box Large Pre-trained Models

    Authors: Zihao Lin, Yan Sun, Yifan Shi, Xueqian Wang, Lifu Huang, Li Shen, Dacheng Tao

    Abstract: With the blowout development of pre-trained models (PTMs), the efficient tuning of these models for diverse downstream applications has emerged as a pivotal research concern. Although recent investigations into prompt tuning have provided promising avenues, three salient challenges persist: (1) memory constraint: the continuous growth in the size of open-source PTMs renders fine-tuning, even a fra… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

    Comments: 20 pages, 6 figures, preprint

  24. arXiv:2310.02575  [pdf, other

    cs.LG cs.CV

    AdaMerging: Adaptive Model Merging for Multi-Task Learning

    Authors: Enneng Yang, Zhenyi Wang, Li Shen, Shiwei Liu, Guibing Guo, Xingwei Wang, Dacheng Tao

    Abstract: Multi-task learning (MTL) aims to empower a model to tackle multiple tasks simultaneously. A recent development known as task arithmetic has revealed that several models, each fine-tuned for distinct tasks, can be directly merged into a single model to execute MTL without necessitating a retraining process using the initial training data. Nevertheless, this direct addition of models often leads to… ▽ More

    Submitted 28 May, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: International Conference on Learning Representations (ICLR 2024)

  25. arXiv:2310.00149  [pdf, other

    cs.LG

    One for All: Towards Training One Graph Model for All Classification Tasks

    Authors: Hao Liu, Jiarui Feng, Lecheng Kong, Ningyue Liang, Dacheng Tao, Yixin Chen, Muhan Zhang

    Abstract: Designing a single model to address multiple tasks has been a long-standing objective in artificial intelligence. Recently, large language models have demonstrated exceptional capability in solving different tasks within the language domain. However, a unified model for various graph tasks remains underexplored, primarily due to the challenges unique to the graph learning domain. First, graph data… ▽ More

    Submitted 18 December, 2023; v1 submitted 29 September, 2023; originally announced October 2023.

  26. arXiv:2309.16980  [pdf, other

    cs.DC

    Analyzing Impact of Data Reduction Techniques on Visualization for AMR Applications Using AMReX Framework

    Authors: Daoce Wang, Jesus Pulido, Pascal Grosset, Jiannan Tian, James Ahrens, Dingwen Tao

    Abstract: Today's scientific simulations generate exceptionally large volumes of data, challenging the capacities of available I/O bandwidth and storage space. This necessitates a substantial reduction in data volume, for which error-bounded lossy compression has emerged as a highly effective strategy. A crucial metric for assessing the efficacy of lossy compression is visualization. Despite extensive resea… ▽ More

    Submitted 29 September, 2023; originally announced September 2023.

  27. arXiv:2309.16979  [pdf, other

    quant-ph cs.ET

    MEMQSim: Highly Memory-Efficient and Modularized Quantum State-Vector Simulation

    Authors: Boyuan Zhang, Bo Fang, Qiang Guan, Ang Li, Dingwen Tao

    Abstract: In this extended abstract, we have introduced a highly memory-efficient state vector simulation of quantum circuits premised on data compression, harnessing the capabilities of both CPUs and GPUs. We have elucidated the inherent challenges in architecting this system, while concurrently proposing our tailored solutions. Moreover, we have delineated our preliminary implementation and deliberated up… ▽ More

    Submitted 29 September, 2023; originally announced September 2023.

  28. arXiv:2309.16976  [pdf, other

    cs.LG cs.DC

    Benchmarking and In-depth Performance Study of Large Language Models on Habana Gaudi Processors

    Authors: Chengming Zhang, Baixi Sun, Xiaodong Yu, Zhen Xie, Weijian Zheng, Kamil Iskra, Pete Beckman, Dingwen Tao

    Abstract: Transformer models have achieved remarkable success in various machine learning tasks but suffer from high computational complexity and resource requirements. The quadratic complexity of the self-attention mechanism further exacerbates these challenges when dealing with long sequences and large datasets. Specialized AI hardware accelerators, such as the Habana GAUDI architecture, offer a promising… ▽ More

    Submitted 29 September, 2023; originally announced September 2023.

  29. arXiv:2309.16599  [pdf, other

    cs.CL

    Unlikelihood Tuning on Negative Samples Amazingly Improves Zero-Shot Translation

    Authors: Changtong Zan, Liang Ding, Li Shen, Yibin Lei, Yibing Zhan, Weifeng Liu, Dacheng Tao

    Abstract: Zero-shot translation (ZST), which is generally based on a multilingual neural machine translation model, aims to translate between unseen language pairs in training data. The common practice to guide the zero-shot language map** during inference is to deliberately insert the source and target language IDs, e.g., <EN> for English and <DE> for German. Recent studies have shown that language IDs s… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

  30. arXiv:2309.12641  [pdf, other

    cs.CV

    Global Context Aggregation Network for Lightweight Saliency Detection of Surface Defects

    Authors: Feng Yan, Xiaoheng Jiang, Yang Lu, Lisha Cui, Shupan Li, Jiale Cao, Mingliang Xu, Dacheng Tao

    Abstract: Surface defect inspection is a very challenging task in which surface defects usually show weak appearances or exist under complex backgrounds. Most high-accuracy defect detection methods require expensive computation and storage overhead, making them less practical in some resource-constrained defect detection applications. Although some lightweight methods have achieved real-time inference speed… ▽ More

    Submitted 22 September, 2023; originally announced September 2023.

  31. arXiv:2309.12639  [pdf, other

    cs.CV

    CINFormer: Transformer network with multi-stage CNN feature injection for surface defect segmentation

    Authors: Xiaoheng Jiang, Kaiyi Guo, Yang Lu, Feng Yan, Hao Liu, Jiale Cao, Mingliang Xu, Dacheng Tao

    Abstract: Surface defect inspection is of great importance for industrial manufacture and production. Though defect inspection methods based on deep learning have made significant progress, there are still some challenges for these methods, such as indistinguishable weak defects and defect-like interference in the background. To address these issues, we propose a transformer network with multi-stage CNN (Co… ▽ More

    Submitted 22 September, 2023; originally announced September 2023.

  32. arXiv:2309.11166  [pdf, other

    cs.CL cs.AI

    Are Large Language Models Really Robust to Word-Level Perturbations?

    Authors: Haoyu Wang, Guozheng Ma, Cong Yu, Ning Gui, Linrui Zhang, Zhiqi Huang, Suwei Ma, Yongzhe Chang, Sen Zhang, Li Shen, Xueqian Wang, Peilin Zhao, Dacheng Tao

    Abstract: The swift advancement in the scales and capabilities of Large Language Models (LLMs) positions them as promising tools for a variety of downstream tasks. In addition to the pursuit of better performance and the avoidance of violent feedback on a certain prompt, to ensure the responsibility of the LLM, much attention is drawn to the robustness of LLMs. However, existing evaluation methods mostly re… ▽ More

    Submitted 27 September, 2023; v1 submitted 20 September, 2023; originally announced September 2023.

  33. arXiv:2309.10376  [pdf, other

    cs.LG cs.AI

    Graph Contrastive Learning Meets Graph Meta Learning: A Unified Method for Few-shot Node Tasks

    Authors: Hao Liu, Jiarui Feng, Lecheng Kong, Dacheng Tao, Yixin Chen, Muhan Zhang

    Abstract: Graph Neural Networks (GNNs) have become popular in Graph Representation Learning (GRL). One fundamental application is few-shot node classification. Most existing methods follow the meta learning paradigm, showing the ability of fast generalization to few-shot tasks. However, recent works indicate that graph contrastive learning combined with fine-tuning can significantly outperform meta learning… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

  34. arXiv:2309.09719  [pdf, other

    cs.LG cs.DC math.OC

    FedLALR: Client-Specific Adaptive Learning Rates Achieve Linear Speedup for Non-IID Data

    Authors: Hao Sun, Li Shen, Shixiang Chen, **gwei Sun, **g Li, Guangzhong Sun, Dacheng Tao

    Abstract: Federated learning is an emerging distributed machine learning method, enables a large number of clients to train a model without exchanging their local data. The time cost of communication is an essential bottleneck in federated learning, especially for training large-scale deep neural networks. Some communication-efficient federated learning methods, such as FedAvg and FedAdam, share the same le… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

    Comments: 40 pages

  35. arXiv:2309.05590  [pdf, other

    cs.CV cs.AI cs.MM

    Temporal Action Localization with Enhanced Instant Discriminability

    Authors: Dingfeng Shi, Qiong Cao, Yujie Zhong, Shan An, Jian Cheng, Haogang Zhu, Dacheng Tao

    Abstract: Temporal action detection (TAD) aims to detect all action boundaries and their corresponding categories in an untrimmed video. The unclear boundaries of actions in videos often result in imprecise predictions of action boundaries by existing methods. To resolve this issue, we propose a one-stage framework named TriDet. First, we propose a Trident-head to model the action boundary via an estimated… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

    Comments: An extended version of the CVPR paper arXiv:2303.07347, submitted to IJCV

  36. arXiv:2309.03599  [pdf, other

    cs.CV

    Chasing Consistency in Text-to-3D Generation from a Single Image

    Authors: Yichen Ouyang, Wenhao Chai, Jiayi Ye, Dapeng Tao, Yibing Zhan, Gaoang Wang

    Abstract: Text-to-3D generation from a single-view image is a popular but challenging task in 3D vision. Although numerous methods have been proposed, existing works still suffer from the inconsistency issues, including 1) semantic inconsistency, 2) geometric inconsistency, and 3) saturation inconsistency, resulting in distorted, overfitted, and over-saturated generations. In light of the above issues, we p… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

    Comments: 9 pages, 11 figures

  37. arXiv:2309.00810  [pdf, other

    cs.CV cs.AI

    RenAIssance: A Survey into AI Text-to-Image Generation in the Era of Large Model

    Authors: Fengxiang Bie, Yibo Yang, Zhongzhu Zhou, Adam Ghanem, Minjia Zhang, Zhewei Yao, Xiaoxia Wu, Connor Holmes, Pareesa Golnari, David A. Clifton, Yuxiong He, Dacheng Tao, Shuaiwen Leon Song

    Abstract: Text-to-image generation (TTI) refers to the usage of models that could process text input and generate high fidelity images based on text descriptions. Text-to-image generation using neural networks could be traced back to the emergence of Generative Adversial Network (GAN), followed by the autoregressive Transformer. Diffusion models are one prominent type of generative model used for the genera… ▽ More

    Submitted 1 September, 2023; originally announced September 2023.

  38. arXiv:2309.00188  [pdf, other

    cs.CV

    DARC: Distribution-Aware Re-Coloring Model for Generalizable Nucleus Segmentation

    Authors: Shengcong Chen, Changxing Ding, Dacheng Tao, Hao Chen

    Abstract: Nucleus segmentation is usually the first step in pathological image analysis tasks. Generalizable nucleus segmentation refers to the problem of training a segmentation model that is robust to domain gaps between the source and target domains. The domain gaps are usually believed to be caused by the varied image acquisition conditions, e.g., different scanners, tissues, or staining protocols. In t… ▽ More

    Submitted 31 August, 2023; originally announced September 2023.

    Comments: Accepted by MICCAI 2023

  39. arXiv:2309.00023  [pdf, other

    cs.LG

    Continual Learning From a Stream of APIs

    Authors: Enneng Yang, Zhenyi Wang, Li Shen, Nan Yin, Tongliang Liu, Guibing Guo, Xingwei Wang, Dacheng Tao

    Abstract: Continual learning (CL) aims to learn new tasks without forgetting previous tasks. However, existing CL methods require a large amount of raw data, which is often unavailable due to copyright considerations and privacy risks. Instead, stakeholders usually release pre-trained machine learning models as a service (MLaaS), which users can access via APIs. This paper considers two practical-yet-novel… ▽ More

    Submitted 31 August, 2023; originally announced September 2023.

  40. arXiv:2308.16406  [pdf, other

    cs.LG

    CktGNN: Circuit Graph Neural Network for Electronic Design Automation

    Authors: Zehao Dong, Weidong Cao, Muhan Zhang, Dacheng Tao, Yixin Chen, Xuan Zhang

    Abstract: The electronic design automation of analog circuits has been a longstanding challenge in the integrated circuit field due to the huge design space and complex design trade-offs among circuit specifications. In the past decades, intensive research efforts have mostly been paid to automate the transistor sizing with a given circuit topology. By recognizing the graph nature of circuits, this paper pr… ▽ More

    Submitted 9 February, 2024; v1 submitted 30 August, 2023; originally announced August 2023.

    Comments: Accepted by ICLR (International Conference on Learning Representations) 2023

  41. arXiv:2308.15982  [pdf, other

    cs.CL

    MerA: Merging Pretrained Adapters For Few-Shot Learning

    Authors: Shwai He, Run-Ze Fan, Liang Ding, Li Shen, Tianyi Zhou, Dacheng Tao

    Abstract: Adapter tuning, which updates only a few parameters, has become a mainstream method for fine-tuning pretrained language models to downstream tasks. However, it often yields subpar results in few-shot learning. AdapterFusion, which assembles pretrained adapters using composition layers tailored to specific tasks, is a possible solution but significantly increases trainable parameters and deployment… ▽ More

    Submitted 30 August, 2023; originally announced August 2023.

  42. arXiv:2308.15817  [pdf

    physics.optics physics.bio-ph

    Label-free image scanning microscopy for kHz super-resolution imaging and single particle tracking

    Authors: Duc-Minh Ta, Alberto Aguilar, Pierre Bon

    Abstract: We report the modification of a label-free image scanning microscope (ISM) to perform asynchronous 2D imaging at 24kHz while kee** the lateral resolution gain and background rejection of a regular label-free ISM setup. Our method uses a resonant mirror oscillating at 12kHz for one-direction scanning and a chromatic line for instantaneous scanning in the other direction. We adapt optical photon r… ▽ More

    Submitted 30 August, 2023; originally announced August 2023.

  43. arXiv:2308.15022  [pdf, other

    cs.CL cs.AI

    Recursively Summarizing Enables Long-Term Dialogue Memory in Large Language Models

    Authors: Qingyue Wang, Liang Ding, Yanan Cao, Zhiliang Tian, Shi Wang, Dacheng Tao, Li Guo

    Abstract: Recently, large language models (LLMs), such as GPT-4, stand out remarkable conversational abilities, enabling them to engage in dynamic and contextually relevant dialogues across a wide range of topics. However, given a long conversation, these chatbots fail to recall past information and tend to generate inconsistent responses. To address this, we propose to recursively generate summaries/ memor… ▽ More

    Submitted 18 February, 2024; v1 submitted 29 August, 2023; originally announced August 2023.

  44. arXiv:2308.13666  [pdf, other

    astro-ph.HE

    A Joint Fermi-GBM and Swift-BAT Analysis of Gravitational-Wave Candidates from the Third Gravitational-wave Observing Run

    Authors: C. Fletcher, J. Wood, R. Hamburg, P. Veres, C. M. Hui, E. Bissaldi, M. S. Briggs, E. Burns, W. H. Cleveland, M. M. Giles, A. Goldstein, B. A. Hristov, D. Kocevski, S. Lesage, B. Mailyan, C. Malacaria, S. Poolakkil, A. von Kienlin, C. A. Wilson-Hodge, The Fermi Gamma-ray Burst Monitor Team, M. Crnogorčević, J. DeLaunay, A. Tohuvavohu, R. Caputo, S. B. Cenko , et al. (1674 additional authors not shown)

    Abstract: We present Fermi Gamma-ray Burst Monitor (Fermi-GBM) and Swift Burst Alert Telescope (Swift-BAT) searches for gamma-ray/X-ray counterparts to gravitational wave (GW) candidate events identified during the third observing run of the Advanced LIGO and Advanced Virgo detectors. Using Fermi-GBM on-board triggers and sub-threshold gamma-ray burst (GRB) candidates found in the Fermi-GBM ground analyses,… ▽ More

    Submitted 25 August, 2023; originally announced August 2023.

  45. arXiv:2308.11994  [pdf, other

    cs.CV

    Progressive Feature Mining and External Knowledge-Assisted Text-Pedestrian Image Retrieval

    Authors: Huafeng Li, Shedan Yang, Yafei Zhang, Dapeng Tao, Zhengtao Yu

    Abstract: Text-Pedestrian Image Retrieval aims to use the text describing pedestrian appearance to retrieve the corresponding pedestrian image. This task involves not only modality discrepancy, but also the challenge of the textual diversity of pedestrians with the same identity. At present, although existing research progress has been made in text-pedestrian image retrieval, these methods do not comprehens… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

  46. arXiv:2308.11290  [pdf, other

    quant-ph cs.AI cs.LG

    ShadowNet for Data-Centric Quantum System Learning

    Authors: Yuxuan Du, Yibo Yang, Tongliang Liu, Zhouchen Lin, Bernard Ghanem, Dacheng Tao

    Abstract: Understanding the dynamics of large quantum systems is hindered by the curse of dimensionality. Statistical learning offers new possibilities in this regime by neural-network protocols and classical shadows, while both methods have limitations: the former is plagued by the predictive uncertainty and the latter lacks the generalization ability. Here we propose a data-centric learning paradigm combi… ▽ More

    Submitted 22 August, 2023; originally announced August 2023.

  47. arXiv:2308.09430  [pdf, other

    cs.LG

    Towards Understanding the Generalizability of Delayed Stochastic Gradient Descent

    Authors: Xiaoge Deng, Li Shen, Shengwei Li, Tao Sun, Dongsheng Li, Dacheng Tao

    Abstract: Stochastic gradient descent (SGD) performed in an asynchronous manner plays a crucial role in training large-scale machine learning models. However, the generalization performance of asynchronous delayed SGD, which is an essential metric for assessing machine learning algorithms, has rarely been explored. Existing generalization error bounds are rather pessimistic and cannot reveal the correlation… ▽ More

    Submitted 17 December, 2023; v1 submitted 18 August, 2023; originally announced August 2023.

  48. arXiv:2308.08290  [pdf, other

    cs.LG cs.DC math.OC

    DFedADMM: Dual Constraints Controlled Model Inconsistency for Decentralized Federated Learning

    Authors: Qinglun Li, Li Shen, Guanghao Li, Quanjun Yin, Dacheng Tao

    Abstract: To address the communication burden issues associated with federated learning (FL), decentralized federated learning (DFL) discards the central server and establishes a decentralized communication network, where each client communicates only with neighboring clients. However, existing DFL methods still suffer from two major challenges: local inconsistency and local heterogeneous overfitting, which… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

    Comments: 24 pages

  49. arXiv:2308.05721  [pdf, other

    cs.CV

    Deformable Mixer Transformer with Gating for Multi-Task Learning of Dense Prediction

    Authors: Yangyang Xu, Yibo Yang, Bernard Ghanem, Lefei Zhang, Du Bo, Dacheng Tao

    Abstract: CNNs and Transformers have their own advantages and both have been widely used for dense prediction in multi-task learning (MTL). Most of the current studies on MTL solely rely on CNN or Transformer. In this work, we present a novel MTL model by combining both merits of deformable CNN and query-based Transformer with shared gating for multi-task learning of dense prediction. This combination may o… ▽ More

    Submitted 21 September, 2023; v1 submitted 10 August, 2023; originally announced August 2023.

    Comments: submitted to IJCV; an extension to our previous AAAI 2023 paper arXiv:2301.03461

  50. arXiv:2308.03822  [pdf, other

    astro-ph.HE

    Search for Eccentric Black Hole Coalescences during the Third Observing Run of LIGO and Virgo

    Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, A. G. Abac, R. Abbott, H. Abe, F. Acernese, K. Ackley, C. Adamcewicz, S. Adhicary, N. Adhikari, R. X. Adhikari, V. K. Adkins, V. B. Adya, C. Affeldt, D. Agarwal, M. Agathos, O. D. Aguiar, I. Aguilar, L. Aiello, A. Ain, P. Ajith, T. Akutsu, S. Albanesi, R. A. Alfaidi , et al. (1750 additional authors not shown)

    Abstract: Despite the growing number of confident binary black hole coalescences observed through gravitational waves so far, the astrophysical origin of these binaries remains uncertain. Orbital eccentricity is one of the clearest tracers of binary formation channels. Identifying binary eccentricity, however, remains challenging due to the limited availability of gravitational waveforms that include effect… ▽ More

    Submitted 7 August, 2023; originally announced August 2023.

    Comments: 24 pages, 5 figures

    Report number: LIGO-P2300080