Skip to main content

Showing 1–50 of 97 results for author: Yue, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18516  [pdf, other

    cs.CV

    Denoising as Adaptation: Noise-Space Domain Adaptation for Image Restoration

    Authors: Kang Liao, Zongsheng Yue, Zhouxia Wang, Chen Change Loy

    Abstract: Although deep learning-based image restoration methods have made significant progress, they still struggle with limited generalization to real-world scenarios due to the substantial domain gap caused by training on synthetic data. Existing methods address this issue by improving data synthesis pipelines, estimating degradation kernels, employing deep internal learning, and performing domain adapta… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Github Repository: https://github.com/KangLiao929/Noise-DA/

  2. arXiv:2406.10284  [pdf, other

    cs.CL cs.SD eess.AS

    Improving child speech recognition with augmented child-like speech

    Authors: Yuanyuan Zhang, Zhengjun Yue, Tanvina Patel, Odette Scharenborg

    Abstract: State-of-the-art ASRs show suboptimal performance for child speech. The scarcity of child speech limits the development of child speech recognition (CSR). Therefore, we studied child-to-child voice conversion (VC) from existing child speakers in the dataset and additional (new) child speakers via monolingual and cross-lingual (Dutch-to-German) VC, respectively. The results showed that cross-lingua… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 5 pages, 1 figure Accepted to INTERSPEECH 2024

  3. arXiv:2406.09815  [pdf, other

    cs.CL cs.AI

    Retrieval Augmented Fact Verification by Synthesizing Contrastive Arguments

    Authors: Zhenrui Yue, Huimin Zeng, Lanyu Shang, Yifan Liu, Yang Zhang, Dong Wang

    Abstract: The rapid propagation of misinformation poses substantial risks to public interest. To combat misinformation, large language models (LLMs) are adapted to automatically verify claim credibility. Nevertheless, existing methods heavily rely on the embedded knowledge within LLMs and / or black-box APIs for evidence collection, leading to subpar performance with smaller LLMs or upon unreliable context.… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024

  4. arXiv:2406.07006  [pdf, other

    cs.CV

    MIPI 2024 Challenge on Few-shot RAW Image Denoising: Methods and Results

    Authors: Xin **, Chunle Guo, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Ruoqi Li, Chang Liu, Ziyi Wang, Yao Du, **g**g Yang, Long Bao, Heng Sun, Xiangyu Kong, Xiaoxia Xing, **long Wu, Yuanyang Xue, Hyunhee Park, Sejun Song, Changho Kim, **gfan Tan , et al. (17 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: CVPR 2024 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Few-shot RAWImage Denoising Challenge Report. Website: https://mipi-challenge.org/MIPI2024/

  5. arXiv:2406.02048  [pdf, other

    cs.IR

    Auto-Encoding or Auto-Regression? A Reality Check on Causality of Self-Attention-Based Sequential Recommenders

    Authors: Yueqi Wang, Zhankui He, Zhenrui Yue, Julian McAuley, Dong Wang

    Abstract: The comparison between Auto-Encoding (AE) and Auto-Regression (AR) has become an increasingly important topic with recent advances in sequential recommendation. At the heart of this discussion lies the comparison of BERT4Rec and SASRec, which serve as representative AE and AR models for self-attentive sequential recommenders. Yet the conclusion of this debate remains uncertain due to: (1) the lack… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  6. arXiv:2405.17221  [pdf, other

    cs.AI cs.AR

    Efficient Orchestrated AI Workflows Execution on Scale-out Spatial Architecture

    Authors: **yi Deng, Xinru Tang, Zhiheng Yue, Guangyang Lu, Qize Yang, Jiahao Zhang, **xi Li, Chao Li, Shaojun Wei, Yang Hu, Shouyi Yin

    Abstract: Given the increasing complexity of AI applications, traditional spatial architectures frequently fall short. Our analysis identifies a pattern of interconnected, multi-faceted tasks encompassing both AI and general computational processes. In response, we have conceptualized "Orchestrated AI Workflows," an approach that integrates various tasks with logic-driven decisions into dynamic, sophisticat… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  7. arXiv:2405.04867  [pdf, other

    eess.IV cs.CV

    MIPI 2024 Challenge on Demosaic for HybridEVS Camera: Methods and Results

    Authors: Yaqi Wu, Zhihao Fan, Xiaofeng Chu, Jimmy S. Ren, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangcheng Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Senyan Xu, Zhi**g Sun, Jiaying Zhu, Yurui Zhu, Xueyang Fu, Zheng-Jun Zha, Jun Cao, Cheng Li, Shu Chen, Liang Ma, Shiyang Zhou, Hai** Zeng, Kai Feng , et al. (24 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: MIPI@CVPR2024. Website: https://mipi-challenge.org/MIPI2024/

  8. arXiv:2405.04046  [pdf

    cs.CR

    MBCT: A Monero-Based Covert Transmission Approach with On-chain Dynamic Session Key Negotiation

    Authors: Zhenshuai Yue, Haoran Zhu, Xiaolin Chang, Jelena Mišić, Vojislav B. Mišić, Junchao Fan

    Abstract: Traditional covert transmission (CT) approaches have been hindering CT application while blockchain technology offers new avenue. Current blockchain-based CT approaches require off-chain negotiation of critical information and often overlook the dynamic session keys updating, which increases the risk of message and key leakage. Additionally, in some approaches the covert transactions exhibit obvio… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  9. arXiv:2404.19534  [pdf, other

    cs.CV

    MIPI 2024 Challenge on Nighttime Flare Removal: Methods and Results

    Authors: Yuekun Dai, Dafeng Zhang, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Peiqing Yang, Zhezhu **, Guanqun Liu, Chen Change Loy, Lize Zhang, Shuai Liu, Chaoyu Feng, Luyang Wang, Shuan Chen, Guangqi Shao, Xiaotao Wang, Lei Lei, Qirui Yang, Qihua Cheng, Zhiqiang Xu, Yihao Liu, Huan**g Yue, **gyu Yang , et al. (38 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 27 May, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Nighttime Flare Removal Challenge Report. Website: https://mipi-challenge.org/MIPI2024/

  10. arXiv:2404.13370  [pdf, other

    cs.CV cs.CL cs.MM

    Movie101v2: Improved Movie Narration Benchmark

    Authors: Zihao Yue, Yepeng Zhang, Ziheng Wang, Qin **

    Abstract: Automatic movie narration targets at creating video-aligned plot descriptions to assist visually impaired audiences. It differs from standard video captioning in that it requires not only describing key visual details but also inferring the plots developed across multiple movie shots, thus posing unique and ongoing challenges. To advance the development of automatic movie narrating systems, we fir… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

  11. arXiv:2404.10716  [pdf, other

    cs.CV

    MOWA: Multiple-in-One Image War** Model

    Authors: Kang Liao, Zongsheng Yue, Zhonghua Wu, Chen Change Loy

    Abstract: While recent image war** approaches achieved remarkable success on existing benchmarks, they still require training separate models for each specific task and cannot generalize well to different camera models or customized manipulations. To address diverse types of war** in practice, we propose a Multiple-in-One image WAr** model (named MOWA) in this work. Specifically, we mitigate the diffi… ▽ More

    Submitted 17 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: Project page: https://kangliao929.github.io/projects/mowa/

  12. arXiv:2404.01232  [pdf, other

    cs.CL cs.CV

    Open-Vocabulary Federated Learning with Multimodal Prototy**

    Authors: Huimin Zeng, Zhenrui Yue, Dong Wang

    Abstract: Existing federated learning (FL) studies usually assume the training label space and test label space are identical. However, in real-world applications, this assumption is too ideal to be true. A new user could come up with queries that involve data from unseen classes, and such open-vocabulary queries would directly defect such FL systems. Therefore, in this work, we explicitly focus on the unde… ▽ More

    Submitted 2 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

    Comments: Accepted at NAACL 2024

  13. arXiv:2403.14952  [pdf, other

    cs.CL cs.AI

    Evidence-Driven Retrieval Augmented Response Generation for Online Misinformation

    Authors: Zhenrui Yue, Huimin Zeng, Yimeng Lu, Lanyu Shang, Yang Zhang, Dong Wang

    Abstract: The proliferation of online misinformation has posed significant threats to public interest. While numerous online users actively participate in the combat against misinformation, many of such responses can be characterized by the lack of politeness and supporting facts. As a solution, text generation approaches are proposed to automatically produce counter-misinformation responses. Nevertheless,… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: Accepted to NAACL 2024

  14. arXiv:2403.07506  [pdf, other

    cs.SE

    Robustness, Security, Privacy, Explainability, Efficiency, and Usability of Large Language Models for Code

    Authors: Zhou Yang, Zhensu Sun, Terry Zhuo Yue, Premkumar Devanbu, David Lo

    Abstract: Large language models for code (LLM4Code), which demonstrate strong performance (e.g., high accuracy) in processing source code, have significantly transformed software engineering. Many studies separately investigate the non-functional properties of LM4Code, but there is no systematic review of how these properties are evaluated and enhanced. This paper fills this gap by thoroughly examining 146… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  15. arXiv:2403.07319  [pdf, other

    cs.CV

    Efficient Diffusion Model for Image Restoration by Residual Shifting

    Authors: Zongsheng Yue, Jianyi Wang, Chen Change Loy

    Abstract: While diffusion-based image restoration (IR) methods have achieved remarkable success, they are still limited by the low inference speed attributed to the necessity of executing hundreds or even thousands of sampling steps. Existing acceleration sampling techniques, though seeking to expedite the process, inevitably sacrifice performance to some extent, resulting in over-blurry restored outcomes.… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: Extended version of NeurIPS paper. Code: https://github.com/zsyOAOA/ResShift

    MSC Class: I.4.4

  16. arXiv:2403.06728  [pdf, other

    cs.CV

    Large Model driven Radiology Report Generation with Clinical Quality Reinforcement Learning

    Authors: Zijian Zhou, Miao**g Shi, Meng Wei, Oluwatosin Alabi, Zijie Yue, Tom Vercauteren

    Abstract: Radiology report generation (RRG) has attracted significant attention due to its potential to reduce the workload of radiologists. Current RRG approaches are still unsatisfactory against clinical standards. This paper introduces a novel RRG method, \textbf{LM-RRG}, that integrates large models (LMs) with clinical quality reinforcement learning to generate accurate and comprehensive chest X-ray rad… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  17. arXiv:2403.04256  [pdf, other

    cs.IR cs.AI

    Federated Recommendation via Hybrid Retrieval Augmented Generation

    Authors: Huimin Zeng, Zhenrui Yue, Qian Jiang, Dong Wang

    Abstract: Federated Recommendation (FR) emerges as a novel paradigm that enables privacy-preserving recommendations. However, traditional FR systems usually represent users/items with discrete identities (IDs), suffering from performance degradation due to the data sparsity and heterogeneity in FR. On the other hand, Large Language Models (LLMs) as recommenders have proven effective across various recommend… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  18. arXiv:2403.02649  [pdf, other

    cs.CV

    Few-shot Learner Parameterization by Diffusion Time-steps

    Authors: Zhongqi Yue, Pan Zhou, Richang Hong, Hanwang Zhang, Qianru Sun

    Abstract: Even when using large multi-modal foundation models, few-shot learning is still challenging -- if there is no proper inductive bias, it is nearly impossible to keep the nuanced class attributes while removing the visually prominent attributes that spuriously correlate with class labels. To this end, we find an inductive bias that the time-steps of a Diffusion Model (DM) can isolate the nuanced cla… ▽ More

    Submitted 26 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024

  19. arXiv:2403.01317  [pdf, other

    cs.LG cs.AR

    Less is More: Hop-Wise Graph Attention for Scalable and Generalizable Learning on Circuits

    Authors: Chenhui Deng, Zichao Yue, Cunxi Yu, Gokce Sarar, Ryan Carey, Rajeev Jain, Zhiru Zhang

    Abstract: While graph neural networks (GNNs) have gained popularity for learning circuit representations in various electronic design automation (EDA) tasks, they face challenges in scalability when applied to large graphs and exhibit limited generalizability to new designs. These limitations make them less practical for addressing large-scale, complex circuit problems. In this work we propose HOGA, a novel… ▽ More

    Submitted 10 April, 2024; v1 submitted 2 March, 2024; originally announced March 2024.

    Comments: Published as a conference paper at Design Automation Conference (DAC) 2024

  20. arXiv:2403.01232  [pdf, other

    cs.LG cs.AI

    Polynormer: Polynomial-Expressive Graph Transformer in Linear Time

    Authors: Chenhui Deng, Zichao Yue, Zhiru Zhang

    Abstract: Graph transformers (GTs) have emerged as a promising architecture that is theoretically more expressive than message-passing graph neural networks (GNNs). However, typical GT models have at least quadratic complexity and thus cannot scale to large graphs. While there are several linear GTs recently proposed, they still lag behind GNN counterparts on several popular graph datasets, which poses a cr… ▽ More

    Submitted 6 April, 2024; v1 submitted 2 March, 2024; originally announced March 2024.

    Comments: Published as a conference paper at International Conference on Learning Representations (ICLR) 2024

  21. arXiv:2402.18871  [pdf, other

    eess.IV cs.CV

    LoLiSRFlow: Joint Single Image Low-light Enhancement and Super-resolution via Cross-scale Transformer-based Conditional Flow

    Authors: Ziyu Yue, Jiaxin Gao, Sihan Xie, Yang Liu, Zhixun Su

    Abstract: The visibility of real-world images is often limited by both low-light and low-resolution, however, these issues are only addressed in the literature through Low-Light Enhancement (LLE) and Super- Resolution (SR) methods. Admittedly, a simple cascade of these approaches cannot work harmoniously to cope well with the highly ill-posed problem for simultaneously enhancing visibility and resolution. I… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  22. arXiv:2402.14545  [pdf, other

    cs.CL cs.CV

    Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective

    Authors: Zihao Yue, Liang Zhang, Qin **

    Abstract: Large Multimodal Models (LMMs) often suffer from multimodal hallucinations, wherein they may create content that is not present in the visual inputs. In this paper, we explore a new angle of this issue: overly detailed training data hinders the model's ability to timely terminate generation, leading to continued outputs beyond visual perception limits. By investigating how the model decides to ter… ▽ More

    Submitted 29 May, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: Accepted to ACL 2024

  23. arXiv:2401.11430  [pdf, other

    cs.CV

    Exploring Diffusion Time-steps for Unsupervised Representation Learning

    Authors: Zhongqi Yue, Jiankun Wang, Qianru Sun, Lei Ji, Eric I-Chao Chang, Hanwang Zhang

    Abstract: Representation learning is all about discovering the hidden modular attributes that generate the data faithfully. We explore the potential of Denoising Diffusion Probabilistic Model (DM) in unsupervised learning of the modular attributes. We build a theoretical framework that connects the diffusion time-steps and the hidden attributes, which serves as an effective inductive bias for unsupervised l… ▽ More

    Submitted 21 January, 2024; originally announced January 2024.

    Comments: Accepted by ICLR 2024

  24. arXiv:2312.15159  [pdf, other

    cs.LG cs.AI cs.AR cs.CL

    Understanding the Potential of FPGA-Based Spatial Acceleration for Large Language Model Inference

    Authors: Hongzheng Chen, Jiahao Zhang, Yixiao Du, Shaojie Xiang, Zichao Yue, Niansong Zhang, Yaohui Cai, Zhiru Zhang

    Abstract: Recent advancements in large language models (LLMs) boasting billions of parameters have generated a significant demand for efficient deployment in inference workloads. The majority of existing approaches rely on temporal architectures that reuse hardware units for different network layers and operators. However, these methods often encounter challenges in achieving low latency due to considerable… ▽ More

    Submitted 7 April, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

    Comments: Accepted for publication in the FCCM'24 Journal Track and will appear in ACM Transactions on Reconfigurable Technology and Systems (TRETS)

  25. arXiv:2311.02342  [pdf, other

    cs.CV

    Proposal-Level Unsupervised Domain Adaptation for Open World Unbiased Detector

    Authors: Xuanyi Liu, Zhongqi Yue, Xian-Sheng Hua

    Abstract: Open World Object Detection (OWOD) combines open-set object detection with incremental learning capabilities to handle the challenge of the open and dynamic visual world. Existing works assume that a foreground predictor trained on the seen categories can be directly transferred to identify the unseen categories' locations by selecting the top-k most confident foreground predictions. However, the… ▽ More

    Submitted 4 November, 2023; originally announced November 2023.

  26. arXiv:2311.02089  [pdf, other

    cs.IR cs.AI cs.CL

    LlamaRec: Two-Stage Recommendation using Large Language Models for Ranking

    Authors: Zhenrui Yue, Sara Rabhi, Gabriel de Souza Pereira Moreira, Dong Wang, Even Oldridge

    Abstract: Recently, large language models (LLMs) have exhibited significant progress in language understanding and generation. By leveraging textual features, customized LLMs are also applied for recommendation and demonstrate improvements across diverse recommendation scenarios. Yet the majority of existing methods perform training-free recommendation that heavily relies on pretrained knowledge (e.g., movi… ▽ More

    Submitted 25 October, 2023; originally announced November 2023.

    Comments: Accepted to PGAI@CIKM 2023

  27. arXiv:2310.14652  [pdf, other

    cs.CV

    Invariant Feature Regularization for Fair Face Recognition

    Authors: Jiali Ma, Zhongqi Yue, Kagaya Tomoyuki, Suzuki Tomoki, Karlekar Jayashree, Sugiri Pranata, Hanwang Zhang

    Abstract: Fair face recognition is all about learning invariant feature that generalizes to unseen faces in any demographic group. Unfortunately, face datasets inevitably capture the imbalanced demographic attributes that are ubiquitous in real-world observations, and the model learns biased feature that generalizes poorly in the minority group. We point out that the bias arises due to the confounding demog… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: Accepted by International Conference on Computer Vision (ICCV) 2023

  28. arXiv:2310.11595  [pdf, other

    cs.CV cs.AI

    WaveAttack: Asymmetric Frequency Obfuscation-based Backdoor Attacks Against Deep Neural Networks

    Authors: Jun Xia, Zhihao Yue, Yingbo Zhou, Zhiwei Ling, Xian Wei, Mingsong Chen

    Abstract: Due to the popularity of Artificial Intelligence (AI) technology, numerous backdoor attacks are designed by adversaries to mislead deep neural network predictions by manipulating training samples and training processes. Although backdoor attacks are effective in various real scenarios, they still suffer from the problems of both low fidelity of poisoned samples and non-negligible transfer in laten… ▽ More

    Submitted 19 October, 2023; v1 submitted 17 October, 2023; originally announced October 2023.

  29. arXiv:2310.02367  [pdf, other

    cs.IR

    Linear Recurrent Units for Sequential Recommendation

    Authors: Zhenrui Yue, Yueqi Wang, Zhankui He, Huimin Zeng, Julian McAuley, Dong Wang

    Abstract: State-of-the-art sequential recommendation relies heavily on self-attention-based recommender models. Yet such models are computationally expensive and often too slow for real-time recommendation. Furthermore, the self-attention operation is performed at a sequence-level, thereby making low-cost incremental inference challenging. Inspired by recent advances in efficient language modeling, we propo… ▽ More

    Submitted 8 November, 2023; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: Accepted to WSDM 2024

  30. arXiv:2309.12742  [pdf, other

    cs.LG

    Make the U in UDA Matter: Invariant Consistency Learning for Unsupervised Domain Adaptation

    Authors: Zhongqi Yue, Hanwang Zhang, Qianru Sun

    Abstract: Domain Adaptation (DA) is always challenged by the spurious correlation between domain-invariant features (e.g., class identity) and domain-specific features (e.g., environment) that does not generalize to the target domain. Unfortunately, even enriched with additional unsupervised target domains, existing Unsupervised DA (UDA) methods still suffer from it. This is because the source domain superv… ▽ More

    Submitted 3 December, 2023; v1 submitted 22 September, 2023; originally announced September 2023.

    Comments: Accepted by NeurIPS 2023

  31. arXiv:2309.05267  [pdf, other

    cs.CV

    Diving into Darkness: A Dual-Modulated Framework for High-Fidelity Super-Resolution in Ultra-Dark Environments

    Authors: Jiaxin Gao, Ziyu Yue, Yaohua Liu, Sihan Xie, Xin Fan, Risheng Liu

    Abstract: Super-resolution tasks oriented to images captured in ultra-dark environments is a practical yet challenging problem that has received little attention. Due to uneven illumination and low signal-to-noise ratio in dark environments, a multitude of problems such as lack of detail and color distortion may be magnified in the super-resolution process compared to normal-lighting environments. Consequen… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

    Comments: 9 pages

  32. arXiv:2308.06227  [pdf, other

    cs.ET cs.AR

    Comprehensive Benchmarking of Binary Neural Networks on NVM Crossbar Architectures

    Authors: Ruirong Huang, Zichao Yue, Caroline Huang, Janarbek Matai, Zhiru Zhang

    Abstract: Non-volatile memory (NVM) crossbars have been identified as a promising technology, for accelerating important machine learning operations, with matrix-vector multiplication being a key example. Binary neural networks (BNNs) are especially well-suited for use with NVM crossbars due to their use of a low-bitwidth representation for both activations and weights. However, the aggressive quantization… ▽ More

    Submitted 11 August, 2023; originally announced August 2023.

  33. arXiv:2307.14638  [pdf, other

    cs.CV

    EqGAN: Feature Equalization Fusion for Few-shot Image Generation

    Authors: Yingbo Zhou, Zhihao Yue, Yutong Ye, Pengyu Zhang, Xian Wei, Mingsong Chen

    Abstract: Due to the absence of fine structure and texture information, existing fusion-based few-shot image generation methods suffer from unsatisfactory generation quality and diversity. To address this problem, we propose a novel feature Equalization fusion Generative Adversarial Network (EqGAN) for few-shot image generation. Unlike existing fusion strategies that rely on either deep features or local re… ▽ More

    Submitted 27 July, 2023; originally announced July 2023.

  34. arXiv:2307.12348  [pdf, other

    cs.CV

    ResShift: Efficient Diffusion Model for Image Super-resolution by Residual Shifting

    Authors: Zongsheng Yue, Jianyi Wang, Chen Change Loy

    Abstract: Diffusion-based image super-resolution (SR) methods are mainly limited by the low inference speed due to the requirements of hundreds or even thousands of sampling steps. Existing acceleration sampling techniques inevitably sacrifice performance to some extent, leading to over-blurry SR results. To address this issue, we propose a novel and efficient diffusion model for SR that significantly reduc… ▽ More

    Submitted 18 October, 2023; v1 submitted 23 July, 2023; originally announced July 2023.

    Comments: NeurIPS 2023 (Spotlight). Project page: https://zsyoaoa.github.io/projects/resshift/

    ACM Class: I.2.10

  35. arXiv:2307.08499  [pdf, other

    cs.NI

    Age of Information in Locally Adaptive Frame Slotted ALOHA

    Authors: Zhiling Yue, Howard H. Yang, Meng Zhang, Nikolaos Pappas

    Abstract: We consider a random access network consisting of source-destination pairs. Each source node generates status updates and transmits this information to its intended destination over a shared spectrum. The goal is to minimize the network-wide Age of Information (AoI). We develop a frame slotted ALOHA (FSA)-based policy for generating and transmitting status updates, where the frame size of each sou… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

  36. arXiv:2307.08249  [pdf, other

    cs.CV

    Random Boxes Are Open-world Object Detectors

    Authors: Yanghao Wang, Zhongqi Yue, Xian-Sheng Hua, Hanwang Zhang

    Abstract: We show that classifiers trained with random region proposals achieve state-of-the-art Open-world Object Detection (OWOD): they can not only maintain the accuracy of the known objects (w/ training labels), but also considerably improve the recall of unknown ones (w/o training labels). Specifically, we propose RandBox, a Fast R-CNN based architecture trained on random proposals at each training ite… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

    Comments: ICCV 2023

  37. arXiv:2306.13460  [pdf, other

    cs.CL

    Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood Estimation

    Authors: Zihao Yue, Anwen Hu, Liang Zhang, Qin **

    Abstract: Image captioning aims to describe visual content in natural language. As 'a picture is worth a thousand words', there could be various correct descriptions for an image. However, with maximum likelihood estimation as the training objective, the captioning model is penalized whenever its prediction mismatches with the label. For instance, when the model predicts a word expressing richer semantics t… ▽ More

    Submitted 28 October, 2023; v1 submitted 23 June, 2023; originally announced June 2023.

    Comments: Accepted to NeurIPS 2023

  38. arXiv:2306.08736  [pdf, other

    cs.CV

    LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation

    Authors: Linfeng Yuan, Miao**g Shi, Zijie Yue, Qijun Chen

    Abstract: Referring video object segmentation (RVOS) aims to segment the target instance referred by a given text expression in a video clip. The text expression normally contains sophisticated description of the instance's appearance, action, and relation with others. It is therefore rather difficult for a RVOS model to capture all these attributes correspondingly in the video; in fact, the model often fav… ▽ More

    Submitted 1 April, 2024; v1 submitted 14 June, 2023; originally announced June 2023.

    Comments: CVPR2024

  39. arXiv:2305.17373  [pdf, other

    cs.CL cs.AI

    Zero- and Few-Shot Event Detection via Prompt-Based Meta Learning

    Authors: Zhenrui Yue, Huimin Zeng, Mengfei Lan, Heng Ji, Dong Wang

    Abstract: With emerging online topics as a source for numerous new events, detecting unseen / rare event types presents an elusive challenge for existing event detection methods, where only limited data access is provided for training. To address the data scarcity problem in event detection, we propose MetaEvent, a meta learning-based framework for zero- and few-shot event detection. Specifically, we sample… ▽ More

    Submitted 27 May, 2023; originally announced May 2023.

    Comments: Accepted to ACL 2023

  40. arXiv:2305.12692  [pdf, other

    cs.CL cs.AI

    MetaAdapt: Domain Adaptive Few-Shot Misinformation Detection via Meta Learning

    Authors: Zhenrui Yue, Huimin Zeng, Yang Zhang, Lanyu Shang, Dong Wang

    Abstract: With emerging topics (e.g., COVID-19) on social media as a source for the spreading misinformation, overcoming the distributional shifts between the original training domain (i.e., source domain) and such target domains remains a non-trivial task for misinformation detection. This presents an elusive challenge for early-stage misinformation detection, where a good amount of data and annotations fr… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    Comments: Accepted to ACL 2023

  41. arXiv:2305.12140  [pdf, other

    cs.CV cs.MM

    Movie101: A New Movie Understanding Benchmark

    Authors: Zihao Yue, Qi Zhang, Anwen Hu, Liang Zhang, Ziheng Wang, Qin **

    Abstract: To help the visually impaired enjoy movies, automatic movie narrating systems are expected to narrate accurate, coherent, and role-aware plots when there are no speaking lines of actors. Existing works benchmark this challenge as a normal video captioning task via some simplifications, such as removing role names and evaluating narrations with ngram-based metrics, which makes it difficult for auto… ▽ More

    Submitted 27 June, 2023; v1 submitted 20 May, 2023; originally announced May 2023.

    Comments: Accepted to ACL 2023

  42. arXiv:2305.10925  [pdf, other

    cs.CV eess.IV

    Unsupervised Hyperspectral Pansharpening via Low-rank Diffusion Model

    Authors: Xiangyu Rui, Xiangyong Cao, Li Pang, Zeyu Zhu, Zongsheng Yue, Deyu Meng

    Abstract: Hyperspectral pansharpening is a process of merging a high-resolution panchromatic (PAN) image and a low-resolution hyperspectral (LRHS) image to create a single high-resolution hyperspectral (HRHS) image. Existing Bayesian-based HS pansharpening methods require designing handcraft image prior to characterize the image features, and deep learning-based HS pansharpening methods usually require a la… ▽ More

    Submitted 19 November, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

  43. arXiv:2305.10730  [pdf, other

    cs.LG

    FedMR: Federated Learning via Model Recombination

    Authors: Ming Hu, Zhihao Yue, Zhiwei Ling, Yihao Huang, Cheng Chen, Xian Wei, Yang Liu, Mingsong Chen

    Abstract: Although Federated Learning (FL) enables global model training across clients without compromising their raw data, existing Federated Averaging (FedAvg)-based methods suffer from the problem of low inference performance, especially for unevenly distributed data among clients. This is mainly because i) FedAvg initializes client models with the same global models, which makes the local training hard… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2208.07677

  44. arXiv:2305.07015  [pdf, other

    cs.CV

    Exploiting Diffusion Prior for Real-World Image Super-Resolution

    Authors: Jianyi Wang, Zongsheng Yue, Shangchen Zhou, Kelvin C. K. Chan, Chen Change Loy

    Abstract: We present a novel approach to leverage prior knowledge encapsulated in pre-trained text-to-image diffusion models for blind super-resolution (SR). Specifically, by employing our time-aware encoder, we can achieve promising restoration results without altering the pre-trained synthesis model, thereby preserving the generative prior and minimizing training cost. To remedy the loss of fidelity cause… ▽ More

    Submitted 28 June, 2024; v1 submitted 11 May, 2023; originally announced May 2023.

    Comments: Accepted by IJCV'2024. Some Figs are compressed due to size limits. Uncompressed ver.: https://github.com/IceClear/StableSR/releases/download/UncompressedPDF/StableSR_IJCV_Uncompressed.pdf. Project page: https://iceclear.github.io/projects/stablesr/

  45. arXiv:2303.12369  [pdf, other

    cs.CV

    Unbiased Multiple Instance Learning for Weakly Supervised Video Anomaly Detection

    Authors: Hui Lv, Zhongqi Yue, Qianru Sun, Bin Luo, Zhen Cui, Hanwang Zhang

    Abstract: Weakly Supervised Video Anomaly Detection (WSVAD) is challenging because the binary anomaly label is only given on the video level, but the output requires snippet-level predictions. So, Multiple Instance Learning (MIL) is prevailing in WSVAD. However, MIL is notoriously known to suffer from many false alarms because the snippet-level detector is easily biased towards the abnormal snippets with si… ▽ More

    Submitted 22 March, 2023; originally announced March 2023.

    Comments: 11 pages,10 figures

  46. arXiv:2303.03895  [pdf, other

    cs.IT eess.SY

    Age of Information Under Frame Slotted ALOHA-Based Status Updating Protocol

    Authors: Zhiling Yue, Howard H. Yang, Meng Zhang, Nikolaos Pappas

    Abstract: We propose a frame slotted ALOHA (FSA)-based protocol for a random access network where sources transmit status updates to their intended destinations. We evaluate the effect of such a protocol on the network's timeliness performance using the Age of Information (AoI) metric. Specifically, we leverage tools from stochastic geometry to model the spatial positions of the source-destination pairs and… ▽ More

    Submitted 7 March, 2023; originally announced March 2023.

  47. arXiv:2212.06512  [pdf, other

    cs.CV

    DifFace: Blind Face Restoration with Diffused Error Contraction

    Authors: Zongsheng Yue, Chen Change Loy

    Abstract: While deep learning-based methods for blind face restoration have achieved unprecedented success, they still suffer from two major limitations. First, most of them deteriorate when facing complex degradations out of their training data. Second, these methods require multiple constraints, e.g., fidelity, perceptual, and adversarial losses, which require laborious hyper-parameter tuning to stabilize… ▽ More

    Submitted 11 December, 2023; v1 submitted 13 December, 2022; originally announced December 2022.

    Comments: Extended to Face Inpainting. Project page: https://github.com/zsyOAOA/DifFace

    ACM Class: I.4.4

  48. arXiv:2212.02006  [pdf, other

    cs.LG

    HierarchyFL: Heterogeneous Federated Learning via Hierarchical Self-Distillation

    Authors: Jun Xia, Yi Zhang, Zhihao Yue, Ming Hu, Xian Wei, Mingsong Chen

    Abstract: Federated learning (FL) has been recognized as a privacy-preserving distributed machine learning paradigm that enables knowledge sharing among various heterogeneous artificial intelligence (AIoT) devices through centralized global model aggregation. FL suffers from model inaccuracy and slow convergence due to the model heterogeneity of the AIoT devices involved. Although various existing methods t… ▽ More

    Submitted 4 December, 2022; originally announced December 2022.

  49. GitFL: Adaptive Asynchronous Federated Learning using Version Control

    Authors: Ming Hu, Zeke Xia, Zhihao Yue, Jun Xia, Yihao Huang, Yang Liu, Mingsong Chen

    Abstract: As a promising distributed machine learning paradigm that enables collaborative training without compromising data privacy, Federated Learning (FL) has been increasingly used in AIoT (Artificial Intelligence of Things) design. However, due to the lack of efficient management of straggling devices, existing FL methods greatly suffer from the problems of low inference accuracy and long training time… ▽ More

    Submitted 22 November, 2022; originally announced November 2022.

  50. arXiv:2210.15401  [pdf, other

    cs.CV

    Facial Video-based Remote Physiological Measurement via Self-supervised Learning

    Authors: Zijie Yue, Miao**g Shi, Shuai Ding

    Abstract: Facial video-based remote physiological measurement aims to estimate remote photoplethysmography (rPPG) signals from human face videos and then measure multiple vital signs (e.g. heart rate, respiration frequency) from rPPG signals. Recent approaches achieve it by training deep neural networks, which normally require abundant facial videos and synchronously recorded photoplethysmography (PPG) sign… ▽ More

    Submitted 22 July, 2023; v1 submitted 27 October, 2022; originally announced October 2022.

    Comments: IEEE Transactions on Pattern Analysis and Machine Intelligence