Skip to main content

Showing 1–50 of 479 results for author: Dong, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00658  [pdf, other

    cs.RO

    A Fast Online Omnidirectional Quadrupedal Jum** Framework Via Virtual-Model Control and Minimum Jerk Trajectory Generation

    Authors: Linzhu Yue, Lingwei Zhang, Zhitao Song, Hongbo Zhang, **hu Dong, Xuanqi Zeng, Yun-Hui Liu

    Abstract: Exploring the limits of quadruped robot agility, particularly in the context of rapid and real-time planning and execution of omnidirectional jump trajectories, presents significant challenges due to the complex dynamics involved, especially when considering significant impulse contacts. This paper introduces a new framework to enable fast, omnidirectional jum** capabilities for quadruped robots… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: IROS2024 paper,7 pages,8 figures

    MSC Class: 68T40 ACM Class: I.2.9

  2. arXiv:2407.00352  [pdf, other

    cs.CV cs.AI

    PhyTracker: An Online Tracker for Phytoplankton

    Authors: Yang Yu, Qingxuan Lv, Yuezun Li, Zhiqiang Wei, Junyu Dong

    Abstract: Phytoplankton, a crucial component of aquatic ecosystems, requires efficient monitoring to understand marine ecological processes and environmental conditions. Traditional phytoplankton monitoring methods, relying on non-in situ observations, are time-consuming and resource-intensive, limiting timely analysis. To address these limitations, we introduce PhyTracker, an intelligent in situ tracking f… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: 13pages,eleven figures

  3. arXiv:2406.18941  [pdf, other

    cs.CV

    CLIP3D-AD: Extending CLIP for 3D Few-Shot Anomaly Detection with Multi-View Images Generation

    Authors: Zuo Zuo, Jiahao Dong, Yao Wu, Yanyun Qu, Zongze Wu

    Abstract: Few-shot anomaly detection methods can effectively address data collecting difficulty in industrial scenarios. Compared to 2D few-shot anomaly detection (2D-FSAD), 3D few-shot anomaly detection (3D-FSAD) is still an unexplored but essential task. In this paper, we propose CLIP3D-AD, an efficient 3D-FSAD method extended on CLIP. We successfully transfer strong generalization ability of CLIP into 3D… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 10 pages, 7 figures

  4. arXiv:2406.18616  [pdf, other

    cs.SE cs.AI cs.CL

    Towards Large Language Model Aided Program Refinement

    Authors: Yufan Cai, Zhe Hou, Xiaokun Luan, David Miguel Sanan Baena, Yun Lin, Jun Sun, ** Song Dong

    Abstract: Program refinement involves correctness-preserving transformations from formal high-level specification statements into executable programs. Traditional verification tool support for program refinement is highly interactive and lacks automation. On the other hand, the emergence of large language models (LLMs) enables automatic code generations from informal natural language specifications. However… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    ACM Class: K.6.3

  5. arXiv:2406.16422  [pdf, other

    cs.CV cs.AI

    Exploring Cross-Domain Few-Shot Classification via Frequency-Aware Prompting

    Authors: Tiange Zhang, Qing Cai, Feng Gao, Lin Qi, Junyu Dong

    Abstract: Cross-Domain Few-Shot Learning has witnessed great stride with the development of meta-learning. However, most existing methods pay more attention to learning domain-adaptive inductive bias (meta-knowledge) through feature-wise manipulation or task diversity improvement while neglecting the phenomenon that deep networks tend to rely more on high-frequency cues to make the classification decision,… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  6. arXiv:2406.08426  [pdf, other

    cs.CL cs.AI cs.DB

    Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL

    Authors: Zi** Hong, Zheng Yuan, Qinggang Zhang, Hao Chen, Junnan Dong, Feiran Huang, Xiao Huang

    Abstract: Generating accurate SQL according to natural language questions (text-to-SQL) is a long-standing challenge due to the complexities involved in user question understanding, database schema comprehension, and SQL generation. Conventional text-to-SQL systems, comprising human engineering and deep neural networks, have made substantial progress. Subsequently, pre-trained language models (PLMs) have be… ▽ More

    Submitted 27 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  7. arXiv:2406.04594  [pdf, other

    cs.DC cs.AI cs.LG

    Boosting Large-scale Parallel Training Efficiency with C4: A Communication-Driven Approach

    Authors: Jianbo Dong, Bin Luo, Jun Zhang, Pengcheng Zhang, Fei Feng, Yikai Zhu, Ang Liu, Zian Chen, Yi Shi, Hairong Jiao, Gang Lu, Yu Guan, Ennan Zhai, Wencong Xiao, Hanyu Zhao, Man Yuan, Siran Yang, Xiang Li, Jiamang Wang, Rui Men, Jianwei Zhang, Huang Zhong, Dennis Cai, Yuan Xie, Binzhang Fu

    Abstract: The emergence of Large Language Models (LLMs) has necessitated the adoption of parallel training techniques, involving the deployment of thousands of GPUs to train a single model. Unfortunately, we have found that the efficiency of current parallel training is often suboptimal, largely due to the following two main issues. Firstly, hardware failures are inevitable, leading to interruptions in the… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  8. arXiv:2406.01140  [pdf, other

    cs.AI

    Logical Reasoning with Relation Network for Inductive Knowledge Graph Completion

    Authors: Qinggang Zhang, Keyu Duan, Junnan Dong, Pai Zheng, Xiao Huang

    Abstract: Inductive knowledge graph completion (KGC) aims to infer the missing relation for a set of newly-coming entities that never appeared in the training set. Such a setting is more in line with reality, as real-world KGs are constantly evolving and introducing new knowledge. Recent studies have shown promising results using message passing over subgraphs to embed newly-coming entities for inductive KG… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  9. arXiv:2406.00773  [pdf, other

    cs.LG cs.CV

    Diffusion Tuning: Transferring Diffusion Models via Chain of Forgetting

    Authors: **cheng Zhong, Xingzhuo Guo, Jiaxiang Dong, Mingsheng Long

    Abstract: Diffusion models have significantly advanced the field of generative modeling. However, training a diffusion model is computationally expensive, creating a pressing need to adapt off-the-shelf diffusion models for downstream generation tasks. Current fine-tuning methods focus on parameter-efficient transfer learning but overlook the fundamental transfer characteristics of diffusion models. In this… ▽ More

    Submitted 6 June, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

  10. arXiv:2406.00449  [pdf, other

    eess.IV cs.CV

    Dual Hyperspectral Mamba for Efficient Spectral Compressive Imaging

    Authors: Jiahua Dong, Hui Yin, Hongliu Li, Wenbo Li, Yulun Zhang, Salman Khan, Fahad Shahbaz Khan

    Abstract: Deep unfolding methods have made impressive progress in restoring 3D hyperspectral images (HSIs) from 2D measurements through convolution neural networks or Transformers in spectral compressive imaging. However, they cannot efficiently capture long-range dependencies using global receptive fields, which significantly limits their performance in HSI reconstruction. Moreover, these methods may suffe… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: 13 pages, 6 figures

  11. arXiv:2405.20771  [pdf, other

    cs.CR cs.AI cs.CV cs.LG

    Towards Black-Box Membership Inference Attack for Diffusion Models

    Authors: **gwei Li, **g Dong, Tianxing He, **gzhao Zhang

    Abstract: Identifying whether an artwork was used to train a diffusion model is an important research topic, given the rising popularity of AI-generated art and the associated copyright concerns. The work approaches this problem from the membership inference attack (MIA) perspective. We first identify the limitations of applying existing MIA methods for copyright protection: the required access of internal… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  12. arXiv:2405.18414  [pdf, other

    cs.CL cs.AI cs.LG cs.SI

    Don't Forget to Connect! Improving RAG with Graph-based Reranking

    Authors: Jialin Dong, Bahare Fatemi, Bryan Perozzi, Lin F. Yang, Anton Tsitsulin

    Abstract: Retrieval Augmented Generation (RAG) has greatly improved the performance of Large Language Model (LLM) responses by grounding generation with context from existing documents. These systems work well when documents are clearly relevant to a question context. But what about when a document has partial information, or less obvious connections to the context? And how should we reason about connection… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  13. arXiv:2405.17337  [pdf, other

    cs.CL cs.AI

    Cost-efficient Knowledge-based Question Answering with Large Language Models

    Authors: Junnan Dong, Qinggang Zhang, Chuang Zhou, Hao Chen, Daochen Zha, Xiao Huang

    Abstract: Knowledge-based question answering (KBQA) is widely used in many scenarios that necessitate domain knowledge. Large language models (LLMs) bring opportunities to KBQA, while their costs are significantly higher and absence of domain-specific knowledge during pre-training. We are motivated to combine LLMs and prior small models on knowledge graphs (KGMs) for both inferential accuracy and cost savin… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  14. arXiv:2405.16806  [pdf, other

    cs.CL cs.AI

    Entity Alignment with Noisy Annotations from Large Language Models

    Authors: Shengyuan Chen, Qinggang Zhang, Junnan Dong, Wen Hua, Qing Li, Xiao Huang

    Abstract: Entity alignment (EA) aims to merge two knowledge graphs (KGs) by identifying equivalent entity pairs. While existing methods heavily rely on human-generated labels, it is prohibitively expensive to incorporate cross-domain experts for annotation in real-world scenarios. The advent of Large Language Models (LLMs) presents new avenues for automating EA with annotations, inspired by their comprehens… ▽ More

    Submitted 28 May, 2024; v1 submitted 26 May, 2024; originally announced May 2024.

  15. arXiv:2405.15334  [pdf, other

    cs.CL

    Detection and Positive Reconstruction of Cognitive Distortion sentences: Mandarin Dataset and Evaluation

    Authors: Shuya Lin, Yuxiong Wang, Jonathan Dong, Shiguang Ni

    Abstract: This research introduces a Positive Reconstruction Framework based on positive psychology theory. Overcoming negative thoughts can be challenging, our objective is to address and reframe them through a positive reinterpretation. To tackle this challenge, a two-fold approach is necessary: identifying cognitive distortions and suggesting a positively reframed alternative while preserving the origina… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  16. arXiv:2405.15322  [pdf, other

    cs.CR cs.AR

    Dishonest Approximate Computing: A Coming Crisis for Cloud Clients

    Authors: Ye Wang, Jian Dong, Ming Han, ** Wu, Gang Qu

    Abstract: Approximate Computing (AC) has emerged as a promising technique for achieving energy-efficient architectures and is expected to become an effective technique for reducing the electricity cost for cloud service providers (CSP). However, the potential misuse of AC has not received adequate attention, which is a coming crisis behind the blueprint of AC. Driven by the pursuit of illegal financial prof… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 12 pages, 9 figures

  17. arXiv:2405.15135  [pdf, other

    cs.LG

    Exploring the Evolution of Hidden Activations with Live-Update Visualization

    Authors: Xianglin Yang, ** Song Dong

    Abstract: Monitoring the training of neural networks is essential for identifying potential data anomalies, enabling timely interventions and conserving significant computational resources. Apart from the commonly used metrics such as losses and validation accuracies, the hidden representation could give more insight into the model progression. To this end, we introduce SentryCam, an automated, real-time vi… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: Preprint

  18. arXiv:2405.14343  [pdf, other

    cs.CV

    Efficient Visual State Space Model for Image Deblurring

    Authors: Lingshun Kong, Jiangxin Dong, Ming-Hsuan Yang, **shan Pan

    Abstract: Convolutional neural networks (CNNs) and Vision Transformers (ViTs) have achieved excellent performance in image restoration. ViTs typically yield superior results in image restoration compared to CNNs due to their ability to capture long-range dependencies and input-dependent characteristics. However, the computational complexity of Transformer-based models grows quadratically with the image reso… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  19. arXiv:2405.14169  [pdf, other

    cs.CV

    Towards Transferable Attacks Against Vision-LLMs in Autonomous Driving with Typography

    Authors: Nhat Chung, Sensen Gao, Tuan-Anh Vu, Jie Zhang, Aishan Liu, Yun Lin, ** Song Dong, Qing Guo

    Abstract: Vision-Large-Language-Models (Vision-LLMs) are increasingly being integrated into autonomous driving (AD) systems due to their advanced visual-language reasoning capabilities, targeting the perception, prediction, planning, and control mechanisms. However, Vision-LLMs have demonstrated susceptibilities against various types of adversarial attacks, which would compromise their reliability and safet… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 12 pages, 5 tables, 5 figures, work in progress

  20. arXiv:2405.13900  [pdf, other

    cs.LG cs.CV

    Rehearsal-free Federated Domain-incremental Learning

    Authors: Rui Sun, Haoran Duan, Jiahua Dong, Varun Ojha, Tejal Shah, Rajiv Ranjan

    Abstract: We introduce a rehearsal-free federated domain incremental learning framework, RefFiL, based on a global prompt-sharing paradigm to alleviate catastrophic forgetting challenges in federated domain-incremental learning, where unseen domains are continually learned. Typical methods for mitigating forgetting, such as the use of additional datasets and the retention of private data from earlier tasks,… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  21. arXiv:2405.06916  [pdf, other

    cs.CV

    High-order Neighborhoods Know More: HyperGraph Learning Meets Source-free Unsupervised Domain Adaptation

    Authors: **kun Jiang, Qingxuan Lv, Yuezun Li, Yong Du, Sheng Chen, Hui Yu, Junyu Dong

    Abstract: Source-free Unsupervised Domain Adaptation (SFDA) aims to classify target samples by only accessing a pre-trained source model and unlabelled target samples. Since no source data is available, transferring the knowledge from the source domain to the target domain is challenging. Existing methods normally exploit the pair-wise relation among target samples and attempt to discover their correlations… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

  22. arXiv:2405.00074  [pdf, other

    cs.LG cs.SE

    PAODING: A High-fidelity Data-free Pruning Toolkit for Debloating Pre-trained Neural Networks

    Authors: Mark Huasong Meng, Hao Guan, Liuhuo Wan, Sin Gee Teo, Guangdong Bai, ** Song Dong

    Abstract: We present PAODING, a toolkit to debloat pretrained neural network models through the lens of data-free pruning. To preserve the model fidelity, PAODING adopts an iterative process, which dynamically measures the effect of deleting a neuron to identify candidates that have the least impact to the output layer. Our evaluation shows that PAODING can significantly reduce the model size, generalize on… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

    Comments: 3 pages

  23. arXiv:2404.19382  [pdf, other

    cs.CV

    Probing Unlearned Diffusion Models: A Transferable Adversarial Attack Perspective

    Authors: Xiaoxuan Han, Songlin Yang, Wei Wang, Yang Li, **g Dong

    Abstract: Advanced text-to-image diffusion models raise safety concerns regarding identity privacy violation, copyright infringement, and Not Safe For Work content generation. Towards this, unlearning methods have been developed to erase these involved concepts from diffusion models. However, these unlearning methods only shift the text-to-image map** and preserve the visual content within the generative… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  24. arXiv:2404.16748  [pdf, other

    cs.CV

    TELA: Text to Layer-wise 3D Clothed Human Generation

    Authors: Junting Dong, Qi Fang, Zehuan Huang, Xudong Xu, **gbo Wang, Sida Peng, Bo Dai

    Abstract: This paper addresses the task of 3D clothed human generation from textural descriptions. Previous works usually encode the human body and clothes as a holistic model and generate the whole model in a single-stage optimization, which makes them struggle for clothing editing and meanwhile lose fine-grained control over the whole generation process. To solve this, we propose a layer-wise clothed huma… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  25. arXiv:2404.16612  [pdf, other

    cs.CV

    MuseumMaker: Continual Style Customization without Catastrophic Forgetting

    Authors: Chenxi Liu, Gan Sun, Wenqi Liang, Jiahua Dong, Can Qin, Yang Cong

    Abstract: Pre-trained large text-to-image (T2I) models with an appropriate text prompt has attracted growing interests in customized images generation field. However, catastrophic forgetting issue make it hard to continually synthesize new user-provided styles while retaining the satisfying results amongst learned styles. In this paper, we propose MuseumMaker, a method that enables the synthesis of images b… ▽ More

    Submitted 29 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

  26. arXiv:2404.16484  [pdf, other

    cs.CV eess.IV

    Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey

    Authors: Marcos V. Conde, Zhijun Lei, Wen Li, Cosmin Stejerean, Ioannis Katsavounidis, Radu Timofte, Kihwan Yoon, Ganzorig Gankhuyag, Jiangtao Lv, Long Sun, **shan Pan, Jiangxin Dong, **hui Tang, Zhiyuan Li, Hao Wei, Chenyang Ge, Dongyang Zhang, Tianle Liu, Huaian Chen, Yi **, Menghan Zhou, Yiqiang Yan, Si Gao, Biao Wu, Shaoli Liu , et al. (50 additional authors not shown)

    Abstract: This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF cod… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: CVPR 2024, AI for Streaming (AIS) Workshop

  27. arXiv:2404.14852  [pdf, other

    cs.CV

    Ultrasound Nodule Segmentation Using Asymmetric Learning with Simple Clinical Annotation

    Authors: Xingyue Zhao, Zhongyu Li, Xiangde Luo, Peiqi Li, Peng Huang, Jianwei Zhu, Yang Liu, Jihua Zhu, Meng Yang, Shi Chang, Jun Dong

    Abstract: Recent advances in deep learning have greatly facilitated the automated segmentation of ultrasound images, which is essential for nodule morphological analysis. Nevertheless, most existing methods depend on extensive and precise annotations by domain experts, which are labor-intensive and time-consuming. In this study, we suggest using simple aspect ratio annotations directly from ultrasound clini… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: Accepted by TCSVT

  28. arXiv:2404.13873  [pdf, other

    cs.CV

    Texture-aware and Shape-guided Transformer for Sequential DeepFake Detection

    Authors: Yunfei Li, Yuezun Li, Xin Wang, Jiaran Zhou, Junyu Dong

    Abstract: Sequential DeepFake detection is an emerging task that aims to predict the manipulation sequence in order. Existing methods typically formulate it as an image-to-sequence problem, employing conventional Transformer architectures for detection. However, these methods lack dedicated design and consequently result in limited performance. In this paper, we propose a novel Texture-aware and Shape-guide… ▽ More

    Submitted 6 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

  29. arXiv:2404.13872  [pdf, other

    cs.CV

    FreqBlender: Enhancing DeepFake Detection by Blending Frequency Knowledge

    Authors: Hanzhe Li, Yuezun Li, Jiaran Zhou, Bin Li, Junyu Dong

    Abstract: Generating synthetic fake faces, known as pseudo-fake faces, is an effective way to improve the generalization of DeepFake detection. Existing methods typically generate these faces by blending real or fake faces in color space. While these methods have shown promise, they overlook the simulation of frequency distribution in pseudo-fake faces, limiting the learning of generic forgery traces in-dep… ▽ More

    Submitted 6 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

  30. arXiv:2404.13056  [pdf, other

    cs.LG cs.CE stat.CO stat.ME stat.ML

    Variational Bayesian Optimal Experimental Design with Normalizing Flows

    Authors: Jiayuan Dong, Christian Jacobsen, Mehdi Khalloufi, Maryam Akram, Wanjiao Liu, Karthik Duraisamy, Xun Huan

    Abstract: Bayesian optimal experimental design (OED) seeks experiments that maximize the expected information gain (EIG) in model parameters. Directly estimating the EIG using nested Monte Carlo is computationally expensive and requires an explicit likelihood. Variational OED (vOED), in contrast, estimates a lower bound of the EIG without likelihood evaluations by approximating the posterior distributions w… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    MSC Class: 62K05; 94A17; 62C10; 62F15

  31. arXiv:2404.12678  [pdf, other

    cs.CV

    Exploring Interactive Semantic Alignment for Efficient HOI Detection with Vision-language Model

    Authors: Jihao Dong, Renjie Pan, Hua Yang

    Abstract: Human-Object Interaction (HOI) detection aims to localize human-object pairs and comprehend their interactions. Recently, two-stage transformer-based methods have demonstrated competitive performance. However, these methods frequently focus on object appearance features and ignore global contextual information. Besides, vision-language model CLIP which effectively aligns visual and text embeddings… ▽ More

    Submitted 24 May, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

  32. arXiv:2404.11519  [pdf, other

    cs.IR

    Disentangled Cascaded Graph Convolution Networks for Multi-Behavior Recommendation

    Authors: Zhiyong Cheng, Jianhua Dong, Fan Liu, Lei Zhu, Xun Yang, Meng Wang

    Abstract: Multi-behavioral recommender systems have emerged as a solution to address data sparsity and cold-start issues by incorporating auxiliary behaviors alongside target behaviors. However, existing models struggle to accurately capture varying user preferences across different behaviors and fail to account for diverse item preferences within behaviors. Various user preference factors (such as price or… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  33. arXiv:2404.11054  [pdf, other

    cs.CV

    Multilateral Temporal-view Pyramid Transformer for Video Inpainting Detection

    Authors: Ying Zhang, Yuezun Li, Bo Peng, Jiaran Zhou, Huiyu Zhou, Junyu Dong

    Abstract: The task of video inpainting detection is to expose the pixel-level inpainted regions within a video sequence. Existing methods usually focus on leveraging spatial and temporal inconsistencies. However, these methods typically employ fixed operations to combine spatial and temporal clues, limiting their applicability in different scenarios. In this paper, we introduce a novel Multilateral Temporal… ▽ More

    Submitted 6 May, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

  34. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  35. arXiv:2404.08341  [pdf, other

    cs.CV

    Counterfactual Explanations for Face Forgery Detection via Adversarial Removal of Artifacts

    Authors: Yang Li, Songlin Yang, Wei Wang, Ziwen He, Bo Peng, **g Dong

    Abstract: Highly realistic AI generated face forgeries known as deepfakes have raised serious social concerns. Although DNN-based face forgery detection models have achieved good performance, they are vulnerable to latest generative methods that have less forgery traces and adversarial attacks. This limitation of generalization and robustness hinders the credibility of detection results and requires more ex… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: Accepted to ICME2024

  36. RMAFF-PSN: A Residual Multi-Scale Attention Feature Fusion Photometric Stereo Network

    Authors: Kai Luo, Yakun Ju, Lin Qi, Kaixuan Wang, Junyu Dong

    Abstract: Predicting accurate normal maps of objects from two-dimensional images in regions of complex structure and spatial material variations is challenging using photometric stereo methods due to the influence of surface reflection properties caused by variations in object geometry and surface materials. To address this issue, we propose a photometric stereo network called a RMAFF-PSN that uses residual… ▽ More

    Submitted 14 April, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

    Comments: 17 pages,12 figures

    Journal ref: Photonics 2023,10(5),548

  37. arXiv:2404.06516  [pdf, other

    cs.GT cs.LG

    Convergence to Nash Equilibrium and No-regret Guarantee in (Markov) Potential Games

    Authors: **g Dong, Baoxiang Wang, Yaoliang Yu

    Abstract: In this work, we study potential games and Markov potential games under stochastic cost and bandit feedback. We propose a variant of the Frank-Wolfe algorithm with sufficient exploration and recursive gradient estimation, which provably converges to the Nash equilibrium while attaining sublinear regret for each individual player. Our algorithm simultaneously achieves a Nash regret and a regret bou… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  38. arXiv:2404.06251  [pdf, other

    cs.CV

    ColorMNet: A Memory-based Deep Spatial-Temporal Feature Propagation Network for Video Colorization

    Authors: Yixin Yang, Jiangxin Dong, **hui Tang, **shan Pan

    Abstract: How to effectively explore spatial-temporal features is important for video colorization. Instead of stacking multiple frames along the temporal dimension or recurrently propagating estimated features that will accumulate errors or cannot explore information from far-apart frames, we develop a memory-based feature propagation module that can establish reliable connections with features from far-ap… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: Project website: \url{https://github.com/yyang181/colormnet}

  39. arXiv:2404.04745  [pdf, other

    cs.CV

    Collaborative Feedback Discriminative Propagation for Video Super-Resolution

    Authors: Hao Li, Xiang Chen, Jiangxin Dong, **hui Tang, **shan Pan

    Abstract: The key success of existing video super-resolution (VSR) methods stems mainly from exploring spatial and temporal information, which is usually achieved by a recurrent propagation module with an alignment module. However, inaccurate alignment usually leads to aligned features with significant artifacts, which will be accumulated during propagation and thus affect video restoration. Moreover, propa… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: Project website: https://github.com/House-Leo/CFDVSR

  40. arXiv:2404.02562  [pdf, other

    cs.CV

    Representation Alignment Contrastive Regularization for Multi-Object Tracking

    Authors: Zhonglin Liu, Shujie Chen, Jianfeng Dong, Xun Wang, Di Zhou

    Abstract: Achieving high-performance in multi-object tracking algorithms heavily relies on modeling spatio-temporal relationships during the data association stage. Mainstream approaches encompass rule-based and deep learning-based methods for spatio-temporal relationship modeling. While the former relies on physical motion laws, offering wider applicability but yielding suboptimal results for complex objec… ▽ More

    Submitted 17 April, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

  41. arXiv:2404.01547  [pdf, other

    cs.CV

    Bidirectional Multi-Scale Implicit Neural Representations for Image Deraining

    Authors: Xiang Chen, **shan Pan, Jiangxin Dong

    Abstract: How to effectively explore multi-scale representations of rain streaks is important for image deraining. In contrast to existing Transformer-based methods that depend mostly on single-scale rain appearance, we develop an end-to-end multi-scale Transformer that leverages the potentially useful features in various scales to facilitate high-quality image reconstruction. To better explore the common d… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: Project website: https://github.com/cschenxiang/NeRD-Rain

    Journal ref: CVPR 2024

  42. arXiv:2404.00230  [pdf, other

    cs.CV

    Latent Watermark: Inject and Detect Watermarks in Latent Diffusion Space

    Authors: Zheling Meng, Bo Peng, **g Dong

    Abstract: Watermarking is a tool for actively identifying and attributing the images generated by latent diffusion models. Existing methods face the dilemma of watermark robustness and image quality. The reason for this dilemma is that watermark detection is performed in pixel space, implying an intrinsic link between image quality and watermark robustness. In this paper, we highlight that an effective solu… ▽ More

    Submitted 29 March, 2024; originally announced April 2024.

  43. arXiv:2403.16362  [pdf, other

    cs.SE

    AgentFL: Scaling LLM-based Fault Localization to Project-Level Context

    Authors: Yihao Qin, Shangwen Wang, Yiling Lou, **hao Dong, Kaixin Wang, Xiaoling Li, Xiaoguang Mao

    Abstract: Fault Localization (FL) is an essential step during the debugging process. With the strong capabilities of code comprehension, the recent Large Language Models (LLMs) have demonstrated promising performance in diagnosing bugs in the code. Nevertheless, due to LLMs' limited performance in handling long contexts, existing LLM-based fault localization remains on localizing bugs within a small code sc… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  44. arXiv:2403.14023  [pdf

    cs.CR

    A system capable of verifiably and privately screening global DNA synthesis

    Authors: Carsten Baum, Jens Berlips, Walther Chen, Hongrui Cui, Ivan Damgard, Jiangbin Dong, Kevin M. Esvelt, Mingyu Gao, Dana Gretton, Leonard Foner, Martin Kysel, Kaiyi Zhang, Juanru Li, Xiang Li, Omer Paneth, Ronald L. Rivest, Francesca Sage-Ling, Adi Shamir, Yue Shen, Meicen Sun, Vinod Vaikuntanathan, Lynn Van Hauwe, Theia Vogel, Benjamin Weinstein-Raun, Yun Wang , et al. (5 additional authors not shown)

    Abstract: Printing custom DNA sequences is essential to scientific and biomedical research, but the technology can be used to manufacture plagues as well as cures. Just as ink printers recognize and reject attempts to counterfeit money, DNA synthesizers and assemblers should deny unauthorized requests to make viral DNA that could be used to ignite a pandemic. There are three complications. First, we don't n… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: Main text 10 pages, 4 figures. 5 supplementary figures. Total 21 pages. Direct correspondence to: Ivan B. Damgard ([email protected]), Andrew C. Yao ([email protected]), Kevin M. Esvelt ([email protected])

  45. arXiv:2403.11624  [pdf, other

    cs.IR cs.LG

    Dual-Channel Multiplex Graph Neural Networks for Recommendation

    Authors: Xiang Li, Chaofan Fu, Zhongying Zhao, Guanjie Zheng, Chao Huang, Junyu Dong, Yanwei Yu

    Abstract: Efficient recommender systems play a crucial role in accurately capturing user and item attributes that mirror individual preferences. Some existing recommendation techniques have started to shift their focus towards modeling various types of interaction relations between users and items in real-world recommendation scenarios, such as clicks, marking favorites, and purchases on online shop** pla… ▽ More

    Submitted 29 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

  46. arXiv:2403.11172  [pdf, other

    cs.CV

    Artifact Feature Purification for Cross-domain Detection of AI-generated Images

    Authors: Zheling Meng, Bo Peng, **g Dong, Tieniu Tan

    Abstract: In the era of AIGC, the fast development of visual content generation technologies, such as diffusion models, bring potential security risks to our society. Existing generated image detection methods suffer from performance drop when faced with out-of-domain generators and image scenes. To relieve this problem, we propose Artifact Purification Network (APN) to facilitate the artifact extraction fr… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

    Comments: This work is under consideration at Computer Vision and Image Understanding

  47. arXiv:2403.10067  [pdf, other

    eess.IV cs.CV

    Hybrid Convolutional and Attention Network for Hyperspectral Image Denoising

    Authors: Shuai Hu, Feng Gao, Xiaowei Zhou, Junyu Dong, Qian Du

    Abstract: Hyperspectral image (HSI) denoising is critical for the effective analysis and interpretation of hyperspectral data. However, simultaneously modeling global and local features is rarely explored to enhance HSI denoising. In this letter, we propose a hybrid convolution and attention network (HCANet), which leverages both the strengths of convolution neural networks (CNNs) and Transformers. To enhan… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: IEEE GRSL 2024

  48. arXiv:2403.01387  [pdf, other

    cs.LG cs.DC

    A Comprehensive Survey of Federated Transfer Learning: Challenges, Methods and Applications

    Authors: Wei Guo, Fuzhen Zhuang, Xiao Zhang, Yiqi Tong, ** Dong

    Abstract: Federated learning (FL) is a novel distributed machine learning paradigm that enables participants to collaboratively train a centralized model with privacy preservation by eliminating the requirement of data sharing. In practice, FL often involves multiple participants and requires the third party to aggregate global information to guide the update of the target participant. Therefore, many FL me… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

  49. arXiv:2403.00336  [pdf, other

    cs.RO cs.AI

    Never-Ending Behavior-Cloning Agent for Robotic Manipulation

    Authors: Wenqi Liang, Gan Sun, Qian He, Yu Ren, Jiahua Dong, Yang Cong

    Abstract: Relying on multi-modal observations, embodied robots could perform multiple robotic manipulation tasks in unstructured real-world environments. However, most language-conditioned behavior-cloning agents still face existing long-standing challenges, i.e., 3D scene representation and human-level task learning, when adapting into new sequential tasks in practical scenarios. We here investigate these… ▽ More

    Submitted 7 June, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

    Comments: 17 pages, 6 figures, 9 tables

  50. arXiv:2402.19072  [pdf, other

    cs.LG cs.AI

    TimeXer: Empowering Transformers for Time Series Forecasting with Exogenous Variables

    Authors: Yuxuan Wang, Haixu Wu, Jiaxiang Dong, Yong Liu, Yunzhong Qiu, Haoran Zhang, Jianmin Wang, Mingsheng Long

    Abstract: Recent studies have demonstrated remarkable performance in time series forecasting. However, due to the partially-observed nature of real-world applications, solely focusing on the target of interest, so-called endogenous variables, is usually insufficient to guarantee accurate forecasting. Notably, a system is often recorded into multiple variables, where the exogenous series can provide valuable… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.