Skip to main content

Showing 1–50 of 283 results for author: He, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.14635  [pdf, other

    cs.AI cs.LG

    Harvesting Efficient On-Demand Order Pooling from Skilled Couriers: Enhancing Graph Representation Learning for Refining Real-time Many-to-One Assignments

    Authors: Yile Liang, Jiuxia Zhao, Donghui Li, Jie Feng, Chen Zhang, Xuetao Ding, **ghua Hao, Renqing He

    Abstract: The recent past has witnessed a notable surge in on-demand food delivery (OFD) services, offering delivery fulfillment within dozens of minutes after an order is placed. In OFD, pooling multiple orders for simultaneous delivery in real-time order assignment is a pivotal efficiency source, which may in turn extend delivery time. Constructing high-quality order pooling to harmonize platform efficien… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Accepted in KDD 2024 ADS Track

  2. arXiv:2406.12754  [pdf, other

    cs.CL cs.AI

    Chumor 1.0: A Truly Funny and Challenging Chinese Humor Understanding Dataset from Ruo Zhi Ba

    Authors: Ruiqi He, Yushu He, Longju Bai, Jiarui Liu, Zhenjie Sun, Zenghao Tang, He Wang, Hanchen Xia, Naihao Deng

    Abstract: Existing humor datasets and evaluations predominantly focus on English, lacking resources for culturally nuanced humor in non-English languages like Chinese. To address this gap, we construct Chumor, a dataset sourced from Ruo Zhi Ba (RZB), a Chinese Reddit-like platform dedicated to sharing intellectually challenging and culturally specific jokes. We annotate explanations for each joke and evalua… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  3. arXiv:2406.08855  [pdf, other

    cs.RO

    Trajectory Planning for Autonomous Driving in Unstructured Scenarios Based on Graph Neural Network and Numerical Optimization

    Authors: Sumin Zhang, Kuo Li, Rui He, Zhiwei Meng, Yupeng Chang, Xiaosong **, Ri Bai

    Abstract: In unstructured environments, obstacles are diverse and lack lane markings, making trajectory planning for intelligent vehicles a challenging task. Traditional trajectory planning methods typically involve multiple stages, including path planning, speed planning, and trajectory optimization. These methods require the manual design of numerous parameters for each stage, resulting in significant wor… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  4. arXiv:2406.00908  [pdf, other

    cs.CV

    ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation

    Authors: Shaoshu Yang, Yong Zhang, Xiaodong Cun, Ying Shan, Ran He

    Abstract: Video generation has made remarkable progress in recent years, especially since the advent of the video diffusion models. Many video generation models can produce plausible synthetic videos, e.g., Stable Video Diffusion (SVD). However, most video models can only generate low frame rate videos due to the limited GPU memory as well as the difficulty of modeling a large set of frames. The training vi… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  5. arXiv:2405.20044  [pdf, other

    cs.CV

    A Point-Neighborhood Learning Framework for Nasal Endoscope Image Segmentation

    Authors: Pengyu Jie, Wanquan Liu, Chenqiang Gao, Yihui Wen, Rui He, Pengcheng Li, **tao Zhang, Deyu Meng

    Abstract: The lesion segmentation on endoscopic images is challenging due to its complex and ambiguous features. Fully-supervised deep learning segmentation methods can receive good performance based on entirely pixel-level labeled dataset but greatly increase experts' labeling burden. Semi-supervised and weakly supervised methods can ease labeling burden, but heavily strengthen the learning difficulty. To… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 10 pages, 10 figures,

  6. arXiv:2405.17815  [pdf, other

    cs.CV

    Visual Anchors Are Strong Information Aggregators For Multimodal Large Language Model

    Authors: Haogeng Liu, Quanzeng You, Xiaotian Han, Yongfei Liu, Huaibo Huang, Ran He, Hongxia Yang

    Abstract: In the realm of Multimodal Large Language Models (MLLMs), vision-language connector plays a crucial role to link the pre-trained vision encoders with Large Language Models (LLMs). Despite its importance, the vision-language connector has been relatively less explored. In this study, we aim to propose a strong vision-language connector that enables MLLMs to achieve high accuracy while maintain low… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  7. arXiv:2405.16240  [pdf, other

    cs.LG

    Analytic Federated Learning

    Authors: Hui** Zhuang, Run He, Kai Tong, Di Fang, Han Sun, Haoran Li, Tianyi Chen, Ziqian Zeng

    Abstract: In this paper, we introduce analytic federated learning (AFL), a new training paradigm that brings analytical (i.e., closed-form) solutions to the federated learning (FL) community. Our AFL draws inspiration from analytic learning -- a gradient-free technique that trains neural networks with analytical solutions in one epoch. In the local client training stage, the AFL facilitates a one-epoch trai… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  8. arXiv:2405.16093  [pdf, other

    cs.CV

    Diverse Teacher-Students for Deep Safe Semi-Supervised Learning under Class Mismatch

    Authors: Qikai Wang, Rundong He, Yongshun Gong, Chunxiao Ren, Haoliang Sun, Xiaoshui Huang, Yilong Yin

    Abstract: Semi-supervised learning can significantly boost model performance by leveraging unlabeled data, particularly when labeled data is scarce. However, real-world unlabeled data often contain unseen-class samples, which can hinder the classification of seen classes. To address this issue, mainstream safe SSL methods suggest detecting and discarding unseen-class samples from unlabeled data. Nevertheles… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  9. arXiv:2405.13949  [pdf, other

    cs.CV

    PitVQA: Image-grounded Text Embedding LLM for Visual Question Answering in Pituitary Surgery

    Authors: Runlong He, Mengya Xu, Adrito Das, Danyal Z. Khan, Sophia Bano, Hani J. Marcus, Danail Stoyanov, Matthew J. Clarkson, Mobarakol Islam

    Abstract: Visual Question Answering (VQA) within the surgical domain, utilizing Large Language Models (LLMs), offers a distinct opportunity to improve intra-operative decision-making and facilitate intuitive surgeon-AI interaction. However, the development of LLMs for surgical VQA is hindered by the scarcity of diverse and extensive datasets with complex reasoning tasks. Moreover, contextual fusion of the i… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 10 pages, 3 figures

  10. arXiv:2405.13337  [pdf, other

    cs.CV

    Semantic Equitable Clustering: A Simple, Fast and Effective Strategy for Vision Transformer

    Authors: Qihang Fan, Huaibo Huang, Mingrui Chen, Ran He

    Abstract: The Vision Transformer (ViT) has gained prominence for its superior relational modeling prowess. However, its global attention mechanism's quadratic complexity poses substantial computational burdens. A common remedy spatially groups tokens for self-attention, reducing computational requirements. Nonetheless, this strategy neglects semantic information in tokens, possibly scattering semantically-l… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  11. arXiv:2405.13335  [pdf, other

    cs.CV

    Vision Transformer with Sparse Scan Prior

    Authors: Qihang Fan, Huaibo Huang, Mingrui Chen, Ran He

    Abstract: In recent years, Transformers have achieved remarkable progress in computer vision tasks. However, their global modeling often comes with substantial computational overhead, in stark contrast to the human eye's efficient information processing. Inspired by the human eye's sparse scanning mechanism, we propose a \textbf{S}parse \textbf{S}can \textbf{S}elf-\textbf{A}ttention mechanism (… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  12. arXiv:2405.07508  [pdf, other

    cs.SE

    Revealing the value of Repository Centrality in lifespan prediction of Open Source Software Projects

    Authors: Runzhi He, Hengzhi Ye, Minghui Zhou

    Abstract: Background: Open Source Software is the building block of modern software. However, the prevalence of project deprecation in the open source world weakens the integrity of the downstream systems and the broad ecosystem. Therefore it calls for efforts in monitoring and predicting project deprecations, empowering stakeholders to take proactive measures. Challenge: Existing techniques mainly focus on… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  13. arXiv:2404.15684  [pdf, other

    cs.NI

    Generative Diffusion Model (GDM) for Optimization of Wi-Fi Networks

    Authors: Tie Liu, Xuming Fang, Rong He

    Abstract: Generative Diffusion Models (GDMs), have made significant strides in modeling complex data distributions across diverse domains. Meanwhile, Deep Reinforcement Learning (DRL) has demonstrated substantial improvements in optimizing Wi-Fi network performance. Wi-Fi optimization problems are highly challenging to model mathematically, and DRL methods can bypass complex mathematical modeling, while GDM… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: This paper has been submitted to GlobeCom 2024 and is currently under review

  14. arXiv:2404.06022  [pdf, other

    cs.CV cs.AI cs.MM

    Band-Attention Modulated RetNet for Face Forgery Detection

    Authors: Zhida Zhang, Jie Cao, Wenkui Yang, Qihang Fan, Kai Zhou, Ran He

    Abstract: The transformer networks are extensively utilized in face forgery detection due to their scalability across large datasets.Despite their success, transformers face challenges in balancing the capture of global context, which is crucial for unveiling forgery clues, with computational complexity.To mitigate this issue, we introduce Band-Attention modulated RetNet (BAR-Net), a lightweight network des… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  15. arXiv:2404.04565  [pdf, other

    cs.CV

    SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos

    Authors: Tao Wu, Runyu He, Gangshan Wu, Limin Wang

    Abstract: Video-based visual relation detection tasks, such as video scene graph generation, play important roles in fine-grained video understanding. However, current video visual relation detection datasets have two main limitations that hinder the progress of research in this area. First, they do not explore complex human-human interactions in multi-person scenarios. Second, the relation types of existin… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  16. arXiv:2404.00323  [pdf, other

    cs.CV cs.LG

    CLIP-driven Outliers Synthesis for few-shot OOD detection

    Authors: Hao Sun, Rundong He, Zhongyi Han, Zhicong Lin, Yongshun Gong, Yilong Yin

    Abstract: Few-shot OOD detection focuses on recognizing out-of-distribution (OOD) images that belong to classes unseen during training, with the use of only a small number of labeled in-distribution (ID) images. Up to now, a mainstream strategy is based on large-scale vision-language models, such as CLIP. However, these methods overlook a crucial issue: the lack of reliable OOD supervision information, whic… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: 9 pages,5 figures

  17. arXiv:2403.18361  [pdf, other

    cs.CV

    ViTAR: Vision Transformer with Any Resolution

    Authors: Qihang Fan, Quanzeng You, Xiaotian Han, Yongfei Liu, Yunzhe Tao, Huaibo Huang, Ran He, Hongxia Yang

    Abstract: This paper tackles a significant challenge faced by Vision Transformers (ViTs): their constrained scalability across different image resolutions. Typically, ViTs experience a performance decline when processing resolutions different from those seen during training. Our work introduces two key innovations to address this issue. Firstly, we propose a novel module for dynamic resolution adjustment, d… ▽ More

    Submitted 28 March, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

  18. arXiv:2403.17765  [pdf, other

    cs.CV

    MUTE-SLAM: Real-Time Neural SLAM with Multiple Tri-Plane Hash Representations

    Authors: Yifan Yan, Ruomin He, Zhenghua Liu

    Abstract: We introduce MUTE-SLAM, a real-time neural RGB-D SLAM system employing multiple tri-plane hash-encodings for efficient scene representation. MUTE-SLAM effectively tracks camera positions and incrementally builds a scalable multi-map representation for both small and large indoor environments. It dynamically allocates sub-maps for newly observed local regions, enabling constraint-free map** witho… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  19. arXiv:2403.17503  [pdf, other

    cs.LG cs.CV

    DS-AL: A Dual-Stream Analytic Learning for Exemplar-Free Class-Incremental Learning

    Authors: Hui** Zhuang, Run He, Kai Tong, Ziqian Zeng, Cen Chen, Zhi** Lin

    Abstract: Class-incremental learning (CIL) under an exemplar-free constraint has presented a significant challenge. Existing methods adhering to this constraint are prone to catastrophic forgetting, far more so than replay-based techniques that retain access to past samples. In this paper, to solve the exemplar-free CIL problem, we propose a Dual-Stream Analytic Learning (DS-AL) approach. The DS-AL contains… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: Accepted in AAAI 2024

  20. arXiv:2403.15751  [pdf, other

    cs.CV

    AOCIL: Exemplar-free Analytic Online Class Incremental Learning with Low Time and Resource Consumption

    Authors: Hui** Zhuang, Yuchen Liu, Run He, Kai Tong, Ziqian Zeng, Cen Chen, Yi Wang, Lap-Pui Chau

    Abstract: Online Class Incremental Learning (OCIL) aims to train the model in a task-by-task manner, where data arrive in mini-batches at a time while previous data are not accessible. A significant challenge is known as Catastrophic Forgetting, i.e., loss of the previous knowledge on old data. To address this, replay-based methods show competitive results but invade data privacy, while exemplar-free method… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

  21. arXiv:2403.15706  [pdf, other

    cs.LG cs.CV

    G-ACIL: Analytic Learning for Exemplar-Free Generalized Class Incremental Learning

    Authors: Hui** Zhuang, Yizhu Chen, Di Fang, Run He, Kai Tong, Hongxin Wei, Ziqian Zeng, Cen Chen

    Abstract: Class incremental learning (CIL) trains a network on sequential tasks with separated categories but suffers from catastrophic forgetting, where models quickly lose previously learned knowledge when acquiring new tasks. The generalized CIL (GCIL) aims to address the CIL problem in a more real-world scenario, where incoming data have mixed data categories and unknown sample size distribution, leadin… ▽ More

    Submitted 13 April, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

  22. arXiv:2403.13804  [pdf, other

    cs.CV cs.CL cs.LG

    Learning from Models and Data for Visual Grounding

    Authors: Ruozhen He, Paola Cascante-Bonilla, Ziyan Yang, Alexander C. Berg, Vicente Ordonez

    Abstract: We introduce SynGround, a novel framework that combines data-driven learning and knowledge transfer from various large-scale pretrained models to enhance the visual grounding capabilities of a pretrained vision-and-language model. The knowledge transfer from the models initiates the generation of image descriptions through an image description generator. These descriptions serve dual purposes: the… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: Project Page: https://catherine-r-he.github.io/SynGround/

  23. arXiv:2403.13522  [pdf, other

    cs.LG cs.CV

    REAL: Representation Enhanced Analytic Learning for Exemplar-free Class-incremental Learning

    Authors: Run He, Hui** Zhuang, Di Fang, Yizhu Chen, Kai Tong, Cen Chen

    Abstract: Exemplar-free class-incremental learning (EFCIL) aims to mitigate catastrophic forgetting in class-incremental learning without available historical data. Compared with its counterpart (replay-based CIL) that stores historical samples, the EFCIL suffers more from forgetting issues under the exemplar-free constraint. In this paper, inspired by the recently developed analytic learning (AL) based CIL… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  24. arXiv:2403.10098  [pdf, other

    cs.CV

    DiffMAC: Diffusion Manifold Hallucination Correction for High Generalization Blind Face Restoration

    Authors: Nan Gao, Jia Li, Huaibo Huang, Zhi Zeng, Ke Shang, Shuwu Zhang, Ran He

    Abstract: Blind face restoration (BFR) is a highly challenging problem due to the uncertainty of degradation patterns. Current methods have low generalization across photorealistic and heterogeneous domains. In this paper, we propose a Diffusion-Information-Diffusion (DID) framework to tackle diffusion manifold hallucination correction (DiffMAC), which achieves high-generalization face restoration in divers… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: 15 pages, 12 figures

  25. arXiv:2403.05924  [pdf, other

    cs.CV

    CSCNET: Class-Specified Cascaded Network for Compositional Zero-Shot Learning

    Authors: Yanyi Zhang, Qi Jia, Xin Fan, Yu Liu, Ran He

    Abstract: Attribute and object (A-O) disentanglement is a fundamental and critical problem for Compositional Zero-shot Learning (CZSL), whose aim is to recognize novel A-O compositions based on foregone knowledge. Existing methods based on disentangled representation learning lose sight of the contextual dependency between the A-O primitive pairs. Inspired by this, we propose a novel A-O disentangled framew… ▽ More

    Submitted 13 March, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

    Comments: ICASSP 2024

  26. arXiv:2403.03015  [pdf, other

    cs.IT eess.SP

    Low Complexity Channel Estimation for RIS-Assisted THz Systems with Beam Split

    Authors: Xin Su, Ruisi He, Peng Zhang, Bo Ai

    Abstract: To support extremely high data rates, reconfigurable intelligent surface (RIS)-assisted terahertz (THz) communication is considered to be a promising technology for future sixth-generation networks. However, due to the typical employment of hybrid beamforming architecture in THz systems, as well as the passive nature of RIS which lacks the capability to process pilot signals, obtaining channel sta… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  27. arXiv:2403.01487  [pdf, other

    cs.CV

    InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding

    Authors: Haogeng Liu, Quanzeng You, Xiaotian Han, Yiqi Wang, Bohan Zhai, Yongfei Liu, Yunzhe Tao, Huaibo Huang, Ran He, Hongxia Yang

    Abstract: Multimodal Large Language Models (MLLMs) have experienced significant advancements recently. Nevertheless, challenges persist in the accurate recognition and comprehension of intricate details within high-resolution images. Despite being indispensable for the development of robust MLLMs, this area remains underinvestigated. To tackle this challenge, our work introduces InfiMM-HD, a novel architect… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

  28. arXiv:2402.15080  [pdf, other

    cs.CL

    Infusing Hierarchical Guidance into Prompt Tuning: A Parameter-Efficient Framework for Multi-level Implicit Discourse Relation Recognition

    Authors: Haodong Zhao, Ruifang He, Mengnan Xiao, **g Xu

    Abstract: Multi-level implicit discourse relation recognition (MIDRR) aims at identifying hierarchical discourse relations among arguments. Previous methods achieve the promotion through fine-tuning PLMs. However, due to the data scarcity and the task gap, the pre-trained feature space cannot be accurately tuned to the task-specific space, which even aggravates the collapse of the vanilla space. Besides, th… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: accepted to ACL 2023

  29. arXiv:2402.14600  [pdf, other

    cs.AI

    Diffusion Model-Based Multiobjective Optimization for Gasoline Blending Scheduling

    Authors: Wenxuan Fang, Wei Du, Renchu He, Yang Tang, Yaochu **, Gary G. Yen

    Abstract: Gasoline blending scheduling uses resource allocation and operation sequencing to meet a refinery's production requirements. The presence of nonlinearity, integer constraints, and a large number of decision variables adds complexity to this problem, posing challenges for traditional and evolutionary algorithms. This paper introduces a novel multiobjective optimization approach driven by a diffusio… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

  30. arXiv:2402.14577  [pdf, other

    cs.CV

    Debiasing Text-to-Image Diffusion Models

    Authors: Ruifei He, Chuhui Xue, Haoru Tan, Wenqing Zhang, Yingchen Yu, Song Bai, Xiaojuan Qi

    Abstract: Learning-based Text-to-Image (TTI) models like Stable Diffusion have revolutionized the way visual content is generated in various domains. However, recent research has shown that nonnegligible social bias exists in current state-of-the-art TTI systems, which raises important concerns. In this work, we target resolving the social bias in TTI diffusion models. We begin by formalizing the problem se… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  31. arXiv:2402.12424  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    Tables as Texts or Images: Evaluating the Table Reasoning Ability of LLMs and MLLMs

    Authors: Naihao Deng, Zhenjie Sun, Ruiqi He, Aman Sikka, Yulong Chen, Lin Ma, Yue Zhang, Rada Mihalcea

    Abstract: In this paper, we investigate the effectiveness of various LLMs in interpreting tabular data through different prompting strategies and data formats. Our analyses extend across six benchmarks for table-related tasks such as question-answering and fact-checking. We introduce for the first time the assessment of LLMs' performance on image-based table representations. Specifically, we compare five te… ▽ More

    Submitted 5 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: Accepted to ACL 2024 Findings

  32. arXiv:2402.04087  [pdf, other

    cs.CV cs.AI cs.LG

    A Hard-to-Beat Baseline for Training-free CLIP-based Adaptation

    Authors: Zhengbo Wang, Jian Liang, Lijun Sheng, Ran He, Zilei Wang, Tieniu Tan

    Abstract: Contrastive Language-Image Pretraining (CLIP) has gained popularity for its remarkable zero-shot capacity. Recent research has focused on develo** efficient fine-tuning methods, such as prompt learning and adapter, to enhance CLIP's performance in downstream tasks. However, these methods still require additional training time and computational resources, which is undesirable for devices with lim… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: Accepted by ICLR 2024

  33. arXiv:2402.04050  [pdf, other

    cs.LG cs.AI cs.CV

    Connecting the Dots: Collaborative Fine-tuning for Black-Box Vision-Language Models

    Authors: Zhengbo Wang, Jian Liang, Ran He, Zilei Wang, Tieniu Tan

    Abstract: With the emergence of pretrained vision-language models (VLMs), considerable efforts have been devoted to fine-tuning them for downstream tasks. Despite the progress made in designing efficient fine-tuning methods, such methods require access to the model's parameters, which can be challenging as model owners often opt to provide their models as a black box to safeguard model ownership. This paper… ▽ More

    Submitted 3 June, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

    Comments: Accepted by ICML 2024

  34. arXiv:2402.03124  [pdf, other

    cs.CR cs.CV cs.LG

    Towards Eliminating Hard Label Constraints in Gradient Inversion Attacks

    Authors: Yanbo Wang, Jian Liang, Ran He

    Abstract: Gradient inversion attacks aim to reconstruct local training data from intermediate gradients exposed in the federated learning framework. Despite successful attacks, all previous methods, starting from reconstructing a single data point and then relaxing the single-image limit to batch level, are only tested under hard label constraints. Even for single-image reconstruction, we still lack an anal… ▽ More

    Submitted 15 April, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: ICLR2024 poster

  35. arXiv:2401.10274  [pdf, ps, other

    cs.NE cs.AI

    Knowledge-Assisted Dual-Stage Evolutionary Optimization of Large-Scale Crude Oil Scheduling

    Authors: Wanting Zhang, Wei Du, Guo Yu, Renchu He, Wenli Du, Yaochu **

    Abstract: With the scaling up of crude oil scheduling in modern refineries, large-scale crude oil scheduling problems (LSCOSPs) emerge with thousands of binary variables and non-linear constraints, which are challenging to be optimized by traditional optimization methods. To solve LSCOSPs, we take the practical crude oil scheduling from a marine-access refinery as an example and start with modeling LSCOSPs… ▽ More

    Submitted 9 January, 2024; originally announced January 2024.

  36. arXiv:2401.06030  [pdf, other

    cs.CR

    Can We Trust the Unlabeled Target Data? Towards Backdoor Attack and Defense on Model Adaptation

    Authors: Lijun Sheng, Jian Liang, Ran He, Zilei Wang, Tieniu Tan

    Abstract: Model adaptation tackles the distribution shift problem with a pre-trained model instead of raw data, becoming a popular paradigm due to its great privacy protection. Existing methods always assume adapting to a clean target domain, overlooking the security risks of unlabeled samples. In this paper, we explore the potential backdoor attacks on model adaptation launched by well-designed poisoning t… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

    Comments: 11 pages, 4 figures

  37. arXiv:2401.02329  [pdf, other

    cs.LG cs.CV

    Not all Minorities are Equal: Empty-Class-Aware Distillation for Heterogeneous Federated Learning

    Authors: Kuangpu Guo, Yuhe Ding, Jian Liang, Ran He, Zilei Wang, Tieniu Tan

    Abstract: Data heterogeneity, characterized by disparities in local data distribution across clients, poses a significant challenge in federated learning. Substantial efforts have been devoted to addressing the heterogeneity in local label distribution. As minority classes suffer from worse accuracy due to overfitting on local imbalanced data, prior methods often incorporate class-balanced learning techniqu… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

  38. arXiv:2312.10890  [pdf, other

    cs.CV cs.GR

    Low-latency Space-time Supersampling for Real-time Rendering

    Authors: Ruian He, Shili Zhou, Yuqi Sun, Ri Cheng, Weimin Tan, Bo Yan

    Abstract: With the rise of real-time rendering and the evolution of display devices, there is a growing demand for post-processing methods that offer high-resolution content in a high frame rate. Existing techniques often suffer from quality and latency issues due to the disjointed treatment of frame supersampling and extrapolation. In this paper, we recognize the shared context and mechanisms between frame… ▽ More

    Submitted 17 December, 2023; originally announced December 2023.

    Comments: Accepted to AAAI 2024

  39. arXiv:2312.07424  [pdf, other

    cs.LG cs.AI cs.CV

    How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary Investigation

    Authors: Zhongyi Han, Guanglin Zhou, Rundong He, **dong Wang, Tailin Wu, Yilong Yin, Salman Khan, Lina Yao, Tongliang Liu, Kun Zhang

    Abstract: In machine learning, generalization against distribution shifts -- where deployment conditions diverge from the training scenarios -- is crucial, particularly in fields like climate modeling, biomedicine, and autonomous driving. The emergence of foundation models, distinguished by their extensive pretraining and task versatility, has led to an increased interest in their adaptability to distributi… ▽ More

    Submitted 25 February, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

    Comments: added the investigation of Gemini. 66 pages, 41 figures

  40. arXiv:2312.07180  [pdf, other

    cs.CV

    Context-Aware Iteration Policy Network for Efficient Optical Flow Estimation

    Authors: Ri Cheng, Ruian He, Xuhao Jiang, Shili Zhou, Weimin Tan, Bo Yan

    Abstract: Existing recurrent optical flow estimation networks are computationally expensive since they use a fixed large number of iterations to update the flow field for each sample. An efficient network should skip iterations when the flow improvement is limited. In this paper, we develop a Context-Aware Iteration Policy Network for efficient optical flow estimation, which determines the optimal number of… ▽ More

    Submitted 5 January, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

    Comments: 2024, Association for the Advancement of Artificial Intelligence

  41. arXiv:2312.04554  [pdf, other

    cs.CV cs.CL cs.LG

    Improved Visual Grounding through Self-Consistent Explanations

    Authors: Ruozhen He, Paola Cascante-Bonilla, Ziyan Yang, Alexander C. Berg, Vicente Ordonez

    Abstract: Vision-and-language models trained to match images with text can be combined with visual explanation methods to point to the locations of specific objects in an image. Our work shows that the localization --"grounding"-- abilities of these models can be further improved by finetuning for self-consistent visual explanations. We propose a strategy for augmenting existing text-image datasets with par… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

    Comments: Project Page: https://catherine-r-he.github.io/SelfEQ/

  42. arXiv:2312.02918  [pdf, other

    cs.CV

    Multimodal Prompt Perceiver: Empower Adaptiveness, Generalizability and Fidelity for All-in-One Image Restoration

    Authors: Yuang Ai, Huaibo Huang, Xiaoqiang Zhou, Jiexiang Wang, Ran He

    Abstract: Despite substantial progress, all-in-one image restoration (IR) grapples with persistent challenges in handling intricate real-world degradations. This paper introduces MPerceiver: a novel multimodal prompt learning approach that harnesses Stable Diffusion (SD) priors to enhance adaptiveness, generalizability and fidelity for all-in-one image restoration. Specifically, we develop a dual-branch mod… ▽ More

    Submitted 20 March, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

    Comments: 13 pages, 8 figures, 9 tables

  43. arXiv:2312.02212  [pdf, other

    cs.CV

    Portrait Diffusion: Training-free Face Stylization with Chain-of-Painting

    Authors: ** Liu, Huaibo Huang, Chao **, Ran He

    Abstract: Face stylization refers to the transformation of a face into a specific portrait style. However, current methods require the use of example-based adaptation approaches to fine-tune pre-trained generative models so that they demand lots of time and storage space and fail to achieve detailed style transformation. This paper proposes a training-free face stylization framework, named Portrait Diffusio… ▽ More

    Submitted 3 December, 2023; originally announced December 2023.

  44. arXiv:2312.01663  [pdf, other

    cs.CV cs.AI

    Customize your NeRF: Adaptive Source Driven 3D Scene Editing via Local-Global Iterative Training

    Authors: Runze He, Shaofei Huang, Xuecheng Nie, Tianrui Hui, Luoqi Liu, Jiao Dai, Jizhong Han, Guanbin Li, Si Liu

    Abstract: In this paper, we target the adaptive source driven 3D scene editing task by proposing a CustomNeRF model that unifies a text description or a reference image as the editing prompt. However, obtaining desired editing results conformed with the editing prompt is nontrivial since there exist two significant challenges, including accurate editing of only foreground regions and multi-view consistency… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: 14 pages, 13 figures, project website: https://customnerf.github.io/

  45. Throughput Maximization for Intelligent Refracting Surface Assisted mmWave High-Speed Train Communications

    Authors: **g Li, Yong Niu, Hao Wu, Bo Ai, Ruisi He, Ning Wang, Sheng Chen

    Abstract: With the increasing demands from passengers for data-intensive services, millimeter-wave (mmWave) communication is considered as an effective technique to release the transmission pressure on high speed train (HST) networks. However, mmWave signals ncounter severe losses when passing through the carriage, which decreases the quality of services on board. In this paper, we investigate an intelligen… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

    Comments: 13 pages, 7 figures, IEEE Internet of Things Journal

  46. arXiv:2311.16507  [pdf, other

    cs.CV cs.LG

    Exploring Straighter Trajectories of Flow Matching with Diffusion Guidance

    Authors: Siyu Xing, Jie Cao, Huaibo Huang, Xiao-Yu Zhang, Ran He

    Abstract: Flow matching as a paradigm of generative model achieves notable success across various domains. However, existing methods use either multi-round training or knowledge within minibatches, posing challenges in finding a favorable coupling strategy for straight trajectories. To address this issue, we propose a novel approach, Straighter trajectories of Flow Matching (StraightFM). It straightens traj… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

  47. arXiv:2310.16003  [pdf, other

    cs.CV

    CVPR 2023 Text Guided Video Editing Competition

    Authors: Jay Zhangjie Wu, Xiuyu Li, Difei Gao, Zhen Dong, **bin Bai, Aishani Singh, Xiaoyu Xiang, Youzeng Li, Zuwei Huang, Yuanxi Sun, Rui He, Feng Hu, Junhua Hu, Hai Huang, Hanyu Zhu, Xu Cheng, Jie Tang, Mike Zheng Shou, Kurt Keutzer, Forrest Iandola

    Abstract: Humans watch more than a billion hours of video per day. Most of this video was edited manually, which is a tedious process. However, AI-enabled video-generation and video-editing is on the rise. Building on text-to-image models like Stable Diffusion and Imagen, generative AI has improved dramatically on video tasks. But it's hard to evaluate progress in these video tasks because there is no stand… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: Project page: https://sites.google.com/view/loveucvpr23/track4

  48. arXiv:2310.12429  [pdf, other

    cs.IT eess.SP

    Reconfigurable Intelligent Surface Assisted High-Speed Train Communications: Coverage Performance Analysis and Placement Optimization

    Authors: Changzhu Liu, Ruisi He, Yong Niu, Zhu Han, Bo Ai, Meilin Gao, Zhangfeng Ma, Gongpu Wang, Zhangdui Zhong

    Abstract: Reconfigurable intelligent surface (RIS) emerges as an efficient and promising technology for the next wireless generation networks and has attracted a lot of attention owing to the capability of extending wireless coverage by reflecting signals toward targeted receivers. In this paper, we consider a RIS-assisted high-speed train (HST) communication system to enhance wireless coverage and improve… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

    Comments: 14 figures, accepted by IEEE Transactions on Vehicular Technology

  49. arXiv:2310.07702  [pdf, other

    cs.CV

    ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with Diffusion Models

    Authors: Yingqing He, Shaoshu Yang, Haoxin Chen, Xiaodong Cun, Menghan Xia, Yong Zhang, Xintao Wang, Ran He, Qifeng Chen, Ying Shan

    Abstract: In this work, we investigate the capability of generating images from pre-trained diffusion models at much higher resolutions than the training image sizes. In addition, the generated images should have arbitrary image aspect ratios. When generating images directly at a higher resolution, 1024 x 1024, with the pre-trained Stable Diffusion using training images of resolution 512 x 512, we observe p… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Comments: Project page: https://yingqinghe.github.io/scalecrafter/ Github: https://github.com/YingqingHe/ScaleCrafter

  50. arXiv:2310.05060  [pdf, other

    cs.CV cs.AI

    Video-CSR: Complex Video Digest Creation for Visual-Language Models

    Authors: Tingkai Liu, Yunzhe Tao, Haogeng Liu, Qihang Fan, Ding Zhou, Huaibo Huang, Ran He, Hongxia Yang

    Abstract: We present a novel task and human annotated dataset for evaluating the ability for visual-language models to generate captions and summaries for real-world video clips, which we call Video-CSR (Captioning, Summarization and Retrieval). The dataset contains 4.8K YouTube video clips of 20-60 seconds in duration and covers a wide range of topics and interests. Each video clip corresponds to 5 indepen… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.