Skip to main content

Showing 1–50 of 418 results for author: Peng, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01067  [pdf, other

    cs.AI cs.CL cs.CV cs.HC cs.LG

    Human-like object concept representations emerge naturally in multimodal large language models

    Authors: Changde Du, Kaicheng Fu, Bincheng Wen, Yi Sun, Jie Peng, Wei Wei, Ying Gao, Shengpei Wang, Chuncheng Zhang, **peng Li, Shuang Qiu, Le Chang, Huiguang He

    Abstract: The conceptualization and categorization of natural objects in the human mind have long intrigued cognitive scientists and neuroscientists, offering crucial insights into human perception and cognition. Recently, the rapid development of Large Language Models (LLMs) has raised the attractive question of whether these models can also develop human-like object representations through exposure to vas… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2406.19776  [pdf, other

    cs.MM cs.IR

    MDF: A Dynamic Fusion Model for Multi-modal Fake News Detection

    Authors: Hongzhen Lv, Wenzhong Yang, Fuyuan Wei, Jiaren Peng, Haokun Geng

    Abstract: Fake news detection has received increasing attention from researchers in recent years, especially multi-modal fake news detection containing both text and images. However, many previous works have fed two modal features, text and image, into a binary classifier after a simple concatenation or attention mechanism, in which the features contain a large amount of noise inherent in the data,which in… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  3. arXiv:2406.19640  [pdf, other

    cs.CV

    Efficient Event Stream Super-Resolution with Recursive Multi-Branch Fusion

    Authors: Quanmin Liang, Zhilin Huang, Xiawu Zheng, Feidiao Yang, Jun Peng, Kai Huang, Yonghong Tian

    Abstract: Current Event Stream Super-Resolution (ESR) methods overlook the redundant and complementary information present in positive and negative events within the event stream, employing a direct mixing approach for super-resolution, which may lead to detail loss and inefficiency. To address these issues, we propose an efficient Recursive Multi-Branch Information Fusion Network (RMFNet) that separates po… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Journal ref: International Joint Conference on Artificial Intelligence 2024

  4. arXiv:2406.14503  [pdf, other

    cs.CL

    Overview of the CAIL 2023 Argument Mining Track

    Authors: **gcong Liang, Junlong Wang, Xinyu Zhai, Yungui Zhuang, Yiyang Zheng, Xin Xu, Xiandong Ran, Xiaozheng Dong, Honghui Rong, Yanlun Liu, Hao Chen, Yuhan Wei, Donghai Li, Jiajie Peng, Xuan**g Huang, Chongde Shi, Yansong Feng, Yun Song, Zhongyu Wei

    Abstract: We give a detailed overview of the CAIL 2023 Argument Mining Track, one of the Chinese AI and Law Challenge (CAIL) 2023 tracks. The main goal of the track is to identify and extract interacting argument pairs in trial dialogs. It mainly uses summarized judgment documents but can also refer to trial recordings. The track consists of two stages, and we introduce the tasks designed for each stage; we… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  5. arXiv:2406.13185  [pdf, other

    cs.CL

    Learnable In-Context Vector for Visual Question Answering

    Authors: Yingzhe Peng, Chenduo Hao, Xu Yang, Jiawei Peng, Xinting Hu, Xin Geng

    Abstract: As language models continue to scale, Large Language Models (LLMs) have exhibited emerging capabilities in In-Context Learning (ICL), enabling them to solve language tasks by prefixing a few in-context demonstrations (ICDs) as context. Inspired by these advancements, researchers have extended these techniques to develop Large Multimodal Models (LMMs) with ICL capabilities. However, applying ICL us… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  6. arXiv:2406.13170  [pdf, other

    cs.AI cs.CL

    Amphista: Accelerate LLM Inference with Bi-directional Multiple Drafting Heads in a Non-autoregressive Style

    Authors: Ze** Li, Xinlong Yang, Ziheng Gao, Ji Liu, Zhuang Liu, Dong Li, **zhang Peng, Lu Tian, Emad Barsoum

    Abstract: Large Language Models (LLMs) inherently use autoregressive decoding, which lacks parallelism in inference and results in significantly slow inference speeds, especially when hardware parallel accelerators and memory bandwidth are not fully utilized. In this work, we propose Amphista, a speculative decoding algorithm that adheres to a non-autoregressive decoding paradigm. Owing to the increased par… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  7. Tracing the Unseen: Uncovering Human Trafficking Patterns in Job Listings

    Authors: Siyi Zhou, Jiankun Peng, Emilio Ferrara

    Abstract: In the shadow of the digital revolution, the insidious issue of human trafficking has found new breeding grounds within the realms of social media and online job boards. Previous research efforts have predominantly centered on identifying victims via the analysis of escort advertisements. However, our work shifts the focus towards enabling a proactive approach: pinpointing potential traffickers be… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  8. arXiv:2406.11643  [pdf, other

    cs.CV

    AnyMaker: Zero-shot General Object Customization via Decoupled Dual-Level ID Injection

    Authors: Lingjie Kong, Kai Wu, Xiaobin Hu, Wenhui Han, **long Peng, Chengming Xu, Donghao Luo, Jiangning Zhang, Chengjie Wang, Yanwei Fu

    Abstract: Text-to-image based object customization, aiming to generate images with the same identity (ID) as objects of interest in accordance with text prompts and reference images, has made significant progress. However, recent customizing research is dominated by specialized tasks, such as human customization or virtual try-on, leaving a gap in general object customization. To this end, we introduce AnyM… ▽ More

    Submitted 23 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  9. arXiv:2406.05475  [pdf, other

    cs.CV cs.GR eess.IV

    HDRT: Infrared Capture for HDR Imaging

    Authors: **gchao Peng, Thomas Bashford-Rogers, Francesco Banterle, Haitao Zhao, Kurt Debattista

    Abstract: Capturing real world lighting is a long standing challenge in imaging and most practical methods acquire High Dynamic Range (HDR) images by either fusing multiple exposures, or boosting the dynamic range of Standard Dynamic Range (SDR) images. Multiple exposure capture is problematic as it requires longer capture times which can often lead to ghosting problems. The main alternative, inverse tone m… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  10. arXiv:2406.04628  [pdf, other

    cs.CE q-bio.QM

    Projecting Molecules into Synthesizable Chemical Spaces

    Authors: Shitong Luo, Wenhao Gao, Zuofan Wu, Jian Peng, Connor W. Coley, Jianzhu Ma

    Abstract: Discovering new drug molecules is a pivotal yet challenging process due to the near-infinitely large chemical space and notorious demands on time and resources. Numerous generative models have recently been introduced to accelerate the drug discovery process, but their progression to experimental validation remains limited, largely due to a lack of consideration for synthetic accessibility in prac… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  11. arXiv:2406.02263  [pdf, other

    cs.CV

    M3DM-NR: RGB-3D Noisy-Resistant Industrial Anomaly Detection via Multimodal Denoising

    Authors: Chengjie Wang, Haokun Zhu, **long Peng, Yue Wang, Ran Yi, Yunsheng Wu, Lizhuang Ma, Jiangning Zhang

    Abstract: Existing industrial anomaly detection methods primarily concentrate on unsupervised learning with pristine RGB images. Yet, both RGB and 3D data are crucial for anomaly detection, and the datasets are seldom completely clean in practical scenarios. To address above challenges, this paper initially delves into the RGB-3D multi-modal noisy anomaly detection, proposing a novel noise-resistant M3DM-NR… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  12. arXiv:2406.01006  [pdf, other

    cs.CL cs.AI cs.SE

    SemCoder: Training Code Language Models with Comprehensive Semantics

    Authors: Yangruibo Ding, **jun Peng, Marcus J. Min, Gail Kaiser, Junfeng Yang, Baishakhi Ray

    Abstract: Code Large Language Models (Code LLMs) have excelled at tasks like code completion but often miss deeper semantics such as execution effects and dynamic states. This paper aims to bridge the gap between Code LLMs' reliance on static text data and the need for thorough semantic understanding for complex tasks like debugging and program repair. We introduce a novel strategy to train Code LLMs with c… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  13. arXiv:2406.00936  [pdf, other

    cs.CL

    A Survey of Useful LLM Evaluation

    Authors: Ji-Lun Peng, Sijia Cheng, Egil Diau, Yung-Yu Shih, Po-Heng Chen, Yen-Ting Lin, Yun-Nung Chen

    Abstract: LLMs have gotten attention across various research domains due to their exceptional performance on a wide range of complex tasks. Therefore, refined methods to evaluate the capabilities of LLMs are needed to determine the tasks and responsibility they should undertake. Our study mainly discussed how LLMs, as useful tools, should be effectively assessed. We proposed the two-stage framework: from ``… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  14. arXiv:2406.00735  [pdf, other

    q-bio.BM cs.AI cs.LG

    Full-Atom Peptide Design based on Multi-modal Flow Matching

    Authors: Jiahan Li, Chaoran Cheng, Zuofan Wu, Ruihan Guo, Shitong Luo, Zhizhou Ren, Jian Peng, Jianzhu Ma

    Abstract: Peptides, short chains of amino acid residues, play a vital role in numerous biological processes by interacting with other target molecules, offering substantial potential in drug discovery. In this work, we present PepFlow, the first multi-modal deep generative model grounded in the flow-matching framework for the design of full-atom peptides that target specific protein receptors. Drawing inspi… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: ICML 2024

  15. arXiv:2405.20600  [pdf, other

    cs.AI

    Multi-label Class Incremental Emotion Decoding with Augmented Emotional Semantics Learning

    Authors: Kaicheng Fu, Changde Du, Xiaoyu Chen, Jie Peng, Huiguang He

    Abstract: Emotion decoding plays an important role in affective human-computer interaction. However, previous studies ignored the dynamic real-world scenario, where human experience a blend of multiple emotions which are incrementally integrated into the model, leading to the multi-label class incremental learning (MLCIL) problem. Existing methods have difficulty in solving MLCIL issue due to notorious cata… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  16. arXiv:2405.18881  [pdf, other

    cs.LG cs.AI

    Tuning-Free Alignment of Diffusion Models with Direct Noise Optimization

    Authors: Zhiwei Tang, Jiangweizhi Peng, Jiasheng Tang, Mingyi Hong, Fan Wang, Tsung-Hui Chang

    Abstract: In this work, we focus on the alignment problem of diffusion models with a continuous reward function, which represents specific objectives for downstream tasks, such as improving human preference. The central goal of the alignment problem is to adjust the distribution learned by diffusion models such that the generated samples maximize the target reward function. We propose a novel alignment appr… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  17. arXiv:2405.18315  [pdf, other

    cs.AI cs.PL

    DSDL: Data Set Description Language for Bridging Modalities and Tasks in AI Data

    Authors: Bin Wang, Linke Ouyang, Fan Wu, Wenchang Ning, Xiao Han, Zhiyuan Zhao, Jiahui Peng, Yiying Jiang, Dahua Lin, Conghui He

    Abstract: In the era of artificial intelligence, the diversity of data modalities and annotation formats often renders data unusable directly, requiring understanding and format conversion before it can be used by researchers or developers with different needs. To tackle this problem, this article introduces a framework called Dataset Description Language (DSDL) that aims to simplify dataset processing by p… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  18. arXiv:2405.17718  [pdf, other

    cs.CV cs.LG

    AdapNet: Adaptive Noise-Based Network for Low-Quality Image Retrieval

    Authors: Sihe Zhang, Qingdong He, **long Peng, Yuxi Li, Zhengkai Jiang, Jiafu Wu, Mingmin Chi, Yabiao Wang, Chengjie Wang

    Abstract: Image retrieval aims to identify visually similar images within a database using a given query image. Traditional methods typically employ both global and local features extracted from images for matching, and may also apply re-ranking techniques to enhance accuracy. However, these methods often fail to account for the noise present in query images, which can stem from natural or human-induced fac… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  19. arXiv:2405.16441  [pdf, other

    cs.LG stat.ML

    Categorical Flow Matching on Statistical Manifolds

    Authors: Chaoran Cheng, Jiahan Li, Jian Peng, Ge Liu

    Abstract: We introduce Statistical Flow Matching (SFM), a novel and mathematically rigorous flow-matching framework on the manifold of parameterized probability measures inspired by the results from information geometry. We demonstrate the effectiveness of our method on the discrete generation problem by instantiating SFM on the manifold of categorical distributions whose geometric properties remain unexplo… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  20. arXiv:2405.15705  [pdf, other

    cs.AR eess.SY

    Sums: Sniffing Unknown Multiband Signals under Low Sampling Rates

    Authors: **bo Peng, Zhe Chen, Zheng Lin, Haoxuan Yuan, Zihan Fang, Lingzhong Bao, Zihang Song, Ying Li, **g Ren, Yue Gao

    Abstract: Due to sophisticated deployments of all kinds of wireless networks (e.g., 5G, Wi-Fi, Bluetooth, LEO satellite, etc.), multiband signals distribute in a large bandwidth (e.g., from 70 MHz to 8 GHz). Consequently, for network monitoring and spectrum sharing applications, a sniffer for extracting physical layer information, such as structure of packet, with low sampling rate (especially, sub-Nyquist… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 12 pages, 9 figures

  21. arXiv:2405.15542  [pdf, other

    cs.NI cs.DC cs.LG eess.SP

    SATSense: Multi-Satellite Collaborative Framework for Spectrum Sensing

    Authors: Haoxuan Yuan, Zhe Chen, Zheng Lin, **bo Peng, Zihan Fang, Yuhang Zhong, Zihang Song, Yue Gao

    Abstract: Low Earth Orbit satellite Internet has recently been deployed, providing worldwide service with non-terrestrial networks. With the large-scale deployment of both non-terrestrial and terrestrial networks, limited spectrum resources will not be allocated enough. Consequently, dynamic spectrum sharing is crucial for their coexistence in the same spectrum, where accurate spectrum sensing is essential.… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 13 pages, 16 figures

  22. arXiv:2405.15214  [pdf, other

    cs.CV

    PointRWKV: Efficient RWKV-Like Model for Hierarchical Point Cloud Learning

    Authors: Qingdong He, Jiangning Zhang, **long Peng, Haoyang He, Yabiao Wang, Chengjie Wang

    Abstract: Transformers have revolutionized the point cloud learning task, but the quadratic complexity hinders its extension to long sequence and makes a burden on limited computational resources. The recent advent of RWKV, a fresh breed of deep sequence models, has shown immense potential for sequence modeling in NLP tasks. In this paper, we present PointRWKV, a model of linear complexity derived from the… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  23. arXiv:2405.10597  [pdf, other

    cs.LG cs.AI cs.CL

    UniCL: A Universal Contrastive Learning Framework for Large Time Series Models

    Authors: Jiawei Li, **gshu Peng, Haoyang Li, Lei Chen

    Abstract: Time-series analysis plays a pivotal role across a range of critical applications, from finance to healthcare, which involves various tasks, such as forecasting and classification. To handle the inherent complexities of time-series data, such as high dimensionality and noise, traditional supervised learning methods first annotate extensive labels for time-series data in each task, which is very co… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  24. arXiv:2405.06696  [pdf, other

    cs.CL cs.AI

    Multi-level Shared Knowledge Guided Learning for Knowledge Graph Completion

    Authors: Yongxue Shan, Jie Zhou, Jie Peng, Xin Zhou, Jiaqian Yin, Xiaodong Wang

    Abstract: In the task of Knowledge Graph Completion (KGC), the existing datasets and their inherent subtasks carry a wealth of shared knowledge that can be utilized to enhance the representation of knowledge triplets and overall performance. However, no current studies specifically address the shared knowledge within KGC. To bridge this gap, we introduce a multi-level Shared Knowledge Guided learning method… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: The paper has been accepted for publication at TACL. And the arXiv version is a pre-MIT Press publication version

  25. arXiv:2405.05691  [pdf, other

    cs.CV cs.MM

    StableMoFusion: Towards Robust and Efficient Diffusion-based Motion Generation Framework

    Authors: Yiheng Huang, Hui Yang, Chuanchen Luo, Yuxi Wang, Shibiao Xu, Zhaoxiang Zhang, Man Zhang, Junran Peng

    Abstract: Thanks to the powerful generative capacity of diffusion models, recent years have witnessed rapid progress in human motion generation. Existing diffusion-based methods employ disparate network architectures and training strategies. The effect of the design of each component is still unclear. In addition, the iterative denoising process consumes considerable computational overhead, which is prohibi… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  26. arXiv:2405.05589  [pdf, other

    cs.RO

    Rotation Initialization and Stepwise Refinement for Universal LiDAR Calibration

    Authors: Yifan Duan, Xinran Zhang, Guoliang You, Yilong Wu, Xingchen Li, Yao Li, Xiaomeng Chu, Jie Peng, Yu Zhang, Jianmin Ji, Yanyong Zhang

    Abstract: Autonomous systems often employ multiple LiDARs to leverage the integrated advantages, enhancing perception and robustness. The most critical prerequisite under this setting is the estimating the extrinsic between each LiDAR, i.e., calibration. Despite the exciting progress in multi-LiDAR calibration efforts, a universal, sensor-agnostic calibration method remains elusive. According to the coarse-… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: 19 pages, 19 figures

  27. arXiv:2405.04966  [pdf, other

    cs.IT cs.CV cs.MA

    Communication-Efficient Collaborative Perception via Information Filling with Codebook

    Authors: Yue Hu, Juntong Peng, Sifei Liu, Junhao Ge, Si Liu, Siheng Chen

    Abstract: Collaborative perception empowers each agent to improve its perceptual ability through the exchange of perceptual messages with other agents. It inherently results in a fundamental trade-off between perception ability and communication cost. To address this bottleneck issue, our core idea is to optimize the collaborative messages from two key aspects: representation and selection. The proposed cod… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: 10 pages, Accepted by CVPR 2024

  28. arXiv:2405.02834  [pdf, other

    cs.CV

    Scene-Adaptive Person Search via Bilateral Modulations

    Authors: Yimin Jiang, Huibing Wang, **jia Peng, ** Fu, Yang Wang

    Abstract: Person search aims to localize specific a target person from a gallery set of images with various scenes. As the scene of moving pedestrian changes, the captured person image inevitably bring in lots of background noise and foreground noise on the person feature, which are completely unrelated to the person identity, leading to severe performance degeneration. To address this issue, we present a S… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  29. arXiv:2405.02832  [pdf, other

    cs.CV

    Fast One-Stage Unsupervised Domain Adaptive Person Search

    Authors: Tianxiang Cui, Huibing Wang, **jia Peng, Ruoxi Deng, ** Fu, Yang Wang

    Abstract: Unsupervised person search aims to localize a particular target person from a gallery set of scene images without annotations, which is extremely challenging due to the unexpected variations of the unlabeled domains. However, most existing methods dedicate to develo** multi-stage models to adapt domain variations while using clustering for iterative model training, which inevitably increases mod… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  30. TALICS$^3$: Tape Library Cloud Storage System Simulator

    Authors: Suayb S. Arslan, James Peng, Turguy Goker

    Abstract: High performance computing data is surging fast into the exabyte-scale world, where tape libraries are the main platform for long-term durable data storage besides high-cost DNA. Tape libraries are extremely hard to model, but accurate modeling is critical for system administrators to obtain valid performance estimates for their designs. This research introduces a discrete--event tape simulation p… ▽ More

    Submitted 12 June, 2024; v1 submitted 18 January, 2024; originally announced May 2024.

    Comments: 15 pages, 13 figures

    Journal ref: Simulation Modelling Practice and Theory, Volume 134, 2024, 102947

  31. arXiv:2404.19358  [pdf, other

    cs.IT

    QML-IB: Quantized Collaborative Intelligence between Multiple Devices and the Mobile Network

    Authors: **gchen Peng, Boxiang Ren, Lu Yang, Chenghui Peng, Panpan Niu, Hao Wu

    Abstract: The integration of artificial intelligence (AI) and mobile networks is regarded as one of the most important scenarios for 6G. In 6G, a major objective is to realize the efficient transmission of task-relevant data. Then a key problem arises, how to design collaborative AI models for the device side and the network side, so that the transmitted data between the device and the network is efficient… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  32. arXiv:2404.18359  [pdf, other

    cs.CL cs.AI

    FoundaBench: Evaluating Chinese Fundamental Knowledge Capabilities of Large Language Models

    Authors: Wei Li, Ren Ma, Jiang Wu, Chenya Gu, Jiahui Peng, **yang Len, Songyang Zhang, Hang Yan, Dahua Lin, Conghui He

    Abstract: In the burgeoning field of large language models (LLMs), the assessment of fundamental knowledge remains a critical challenge, particularly for models tailored to Chinese language and culture. This paper introduces FoundaBench, a pioneering benchmark designed to rigorously evaluate the fundamental knowledge capabilities of Chinese LLMs. FoundaBench encompasses a diverse array of 3354 multiple-choi… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  33. arXiv:2404.17825  [pdf, other

    cs.CV

    ODCR: Orthogonal Decoupling Contrastive Regularization for Unpaired Image Dehazing

    Authors: Zhongze Wang, Haitao Zhao, **gchao Peng, Lujian Yao, Kaijie Zhao

    Abstract: Unpaired image dehazing (UID) holds significant research importance due to the challenges in acquiring haze/clear image pairs with identical backgrounds. This paper proposes a novel method for UID named Orthogonal Decoupling Contrastive Regularization (ODCR). Our method is grounded in the assumption that an image consists of both haze-related features, which influence the degree of haze, and haze-… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  34. arXiv:2404.17534  [pdf, other

    cs.CV cs.MM

    Exploring the Distinctiveness and Fidelity of the Descriptions Generated by Large Vision-Language Models

    Authors: Yuhang Huang, Zihan Wu, Chongyang Gao, Jiawei Peng, Xu Yang

    Abstract: Large Vision-Language Models (LVLMs) are gaining traction for their remarkable ability to process and integrate visual and textual data. Despite their popularity, the capacity of LVLMs to generate precise, fine-grained textual descriptions has not been fully explored. This study addresses this gap by focusing on \textit{distinctiveness} and \textit{fidelity}, assessing how models like Open-Flaming… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: 11 pages, 9 figures, 6 tables. For associated code, see https://anonymous.4open.science/r/Explore_FGVDs-E277

  35. arXiv:2404.15805  [pdf, other

    q-bio.BM cs.LG

    Beyond ESM2: Graph-Enhanced Protein Sequence Modeling with Efficient Clustering

    Authors: Shujian Jiao, Bingxuan Li, Lei Wang, Xiao** Zhang, Wei Chen, Jiajie Peng, Zhongyu Wei

    Abstract: Proteins are essential to life's processes, underpinning evolution and diversity. Advances in sequencing technology have revealed millions of proteins, underscoring the need for sophisticated pre-trained protein models for biological analysis and AI development. Facebook's ESM2, the most advanced protein language model to date, leverages a masked prediction task for unsupervised learning, crafting… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  36. arXiv:2404.15704  [pdf, other

    cs.LG cs.AI cs.SD eess.AS

    Efficient Multi-Model Fusion with Adversarial Complementary Representation Learning

    Authors: Zuheng Kang, Yayun He, Jianzong Wang, Junqing Peng, **g Xiao

    Abstract: Single-model systems often suffer from deficiencies in tasks such as speaker verification (SV) and image classification, relying heavily on partial prior knowledge during decision-making, resulting in suboptimal performance. Although multi-model fusion (MMF) can mitigate some of these issues, redundancy in learned representations may limits improvements. To this end, we propose an adversarial comp… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)

  37. arXiv:2404.13923  [pdf, other

    cs.CV

    MaterialSeg3D: Segmenting Dense Materials from 2D Priors for 3D Assets

    Authors: Zeyu Li, Ruitong Gan, Chuanchen Luo, Yuxi Wang, Jiaheng Liu, Ziwei Zhu Man Zhang, Qing Li, Xucheng Yin, Zhaoxiang Zhang, Junran Peng

    Abstract: Driven by powerful image diffusion models, recent research has achieved the automatic creation of 3D objects from textual or visual guidance. By performing score distillation sampling (SDS) iteratively across different views, these methods succeed in lifting 2D generative prior to the 3D space. However, such a 2D generative image prior bakes the effect of illumination and shadow into the texture.… ▽ More

    Submitted 16 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

  38. arXiv:2404.13892  [pdf, other

    cs.SD cs.AI eess.AS

    Retrieval-Augmented Audio Deepfake Detection

    Authors: Zuheng Kang, Yayun He, Botao Zhao, Xiaoyang Qu, Junqing Peng, **g Xiao, Jianzong Wang

    Abstract: With recent advances in speech synthesis including text-to-speech (TTS) and voice conversion (VC) systems enabling the generation of ultra-realistic audio deepfakes, there is growing concern about their potential misuse. However, most deepfake (DF) detection methods rely solely on the fuzzy knowledge learned by a single model, resulting in performance bottlenecks and transparency issues. Inspired… ▽ More

    Submitted 23 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: Accepted by the 2024 International Conference on Multimedia Retrieval (ICMR 2024)

  39. arXiv:2404.13647  [pdf, other

    cs.LG

    Mean Aggregator Is More Robust Than Robust Aggregators Under Label Poisoning Attacks

    Authors: Jie Peng, Weiyu Li, Qing Ling

    Abstract: Robustness to malicious attacks is of paramount importance for distributed learning. Existing works often consider the classical Byzantine attacks model, which assumes that some workers can send arbitrarily malicious messages to the server and disturb the aggregation steps of the distributed learning process. To defend against such worst-case Byzantine attacks, various robust aggregators have been… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: Accepted by IJCAI 2024

  40. arXiv:2404.13496  [pdf, other

    math.NA cs.AI

    ODE-DPS: ODE-based Diffusion Posterior Sampling for Inverse Problems in Partial Differential Equation

    Authors: Enze Jiang, Jishen Peng, Zheng Ma, Xiong-Bin Yan

    Abstract: In recent years we have witnessed a growth in mathematics for deep learning, which has been used to solve inverse problems of partial differential equations (PDEs). However, most deep learning-based inversion methods either require paired data or necessitate retraining neural networks for modifications in the conditions of the inverse problem, significantly reducing the efficiency of inversion and… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

  41. arXiv:2404.11326  [pdf, other

    cs.CV

    Single-temporal Supervised Remote Change Detection for Domain Generalization

    Authors: Qiangang Du, **long Peng, Xu Chen, Qingdong He, Liren He, Qiang Nie, Wenbing Zhu, Mingmin Chi, Yabiao Wang, Chengjie Wang

    Abstract: Change detection is widely applied in remote sensing image analysis. Existing methods require training models separately for each dataset, which leads to poor domain generalization. Moreover, these methods rely heavily on large amounts of high-quality pair-labelled data for training, which is expensive and impractical. In this paper, we propose a multimodal contrastive learning (ChangeCLIP) based… ▽ More

    Submitted 23 April, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

  42. arXiv:2404.11318  [pdf, other

    cs.CV

    Leveraging Fine-Grained Information and Noise Decoupling for Remote Sensing Change Detection

    Authors: Qiangang Du, **long Peng, Changan Wang, Xu Chen, Qingdong He, Wenbing Zhu, Mingmin Chi, Yabiao Wang, Chengjie Wang

    Abstract: Change detection aims to identify remote sense object changes by analyzing data between bitemporal image pairs. Due to the large temporal and spatial span of data collection in change detection image pairs, there are often a significant amount of task-specific and task-agnostic noise. Previous effort has focused excessively on denoising, with this goes a great deal of loss of fine-grained informat… ▽ More

    Submitted 21 June, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

  43. arXiv:2404.09193  [pdf, other

    cs.CV

    FaceCat: Enhancing Face Recognition Security with a Unified Generative Model Framework

    Authors: Jiawei Chen, Xiao Yang, Yinpeng Dong, Hang Su, Jianteng Peng, Zhaoxia Yin

    Abstract: Face anti-spoofing (FAS) and adversarial detection (FAD) have been regarded as critical technologies to ensure the safety of face recognition systems. As a consequence of their limited practicality and generalization, some existing methods aim to devise a framework capable of concurrently detecting both threats to address the challenge. Nevertheless, these methods still encounter challenges of ins… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

    Comments: Under review

  44. arXiv:2404.07821  [pdf, other

    cs.CV

    Sparse Laneformer

    Authors: Ji Liu, Zifeng Zhang, Mingjie Lu, Hongyang Wei, Dong Li, Yile Xie, **zhang Peng, Lu Tian, Ashish Sirasao, Emad Barsoum

    Abstract: Lane detection is a fundamental task in autonomous driving, and has achieved great progress as deep learning emerges. Previous anchor-based methods often design dense anchors, which highly depend on the training dataset and remain fixed during inference. We analyze that dense anchors are not necessary for lane detection, and propose a transformer-based lane detection framework based on a sparse an… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  45. arXiv:2404.03327  [pdf, other

    cs.CV eess.IV

    DI-Retinex: Digital-Imaging Retinex Theory for Low-Light Image Enhancement

    Authors: Shangquan Sun, Wenqi Ren, **gyang Peng, Fenglong Song, Xiaochun Cao

    Abstract: Many existing methods for low-light image enhancement (LLIE) based on Retinex theory ignore important factors that affect the validity of this theory in digital imaging, such as noise, quantization error, non-linearity, and dynamic range overflow. In this paper, we propose a new expression called Digital-Imaging Retinex theory (DI-Retinex) through theoretical and experimental analysis of Retinex t… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  46. arXiv:2404.01133  [pdf, other

    cs.CV

    CityGaussian: Real-time High-quality Large-Scale Scene Rendering with Gaussians

    Authors: Yang Liu, He Guan, Chuanchen Luo, Lue Fan, Junran Peng, Zhaoxiang Zhang

    Abstract: The advancement of real-time 3D scene reconstruction and novel view synthesis has been significantly propelled by 3D Gaussian Splatting (3DGS). However, effectively training large-scale 3DGS and rendering it in real-time across various scales remains challenging. This paper introduces CityGaussian (CityGS), which employs a novel divide-and-conquer training approach and Level-of-Detail (LoD) strate… ▽ More

    Submitted 7 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

    Comments: Project Page: https://dekuliutesla.github.io/citygs/

  47. arXiv:2404.00667  [pdf, other

    cs.CV

    Weakly-Supervised Cross-Domain Segmentation of Electron Microscopy with Sparse Point Annotation

    Authors: Dafei Qiu, Shan Xiong, Jia** Yi, Jialin Peng

    Abstract: Accurate segmentation of organelle instances from electron microscopy (EM) images plays an essential role in many neuroscience researches. However, practical scenarios usually suffer from high annotation costs, label scarcity, and large domain diversity. While unsupervised domain adaptation (UDA) that assumes no annotation effort on the target data is promising to alleviate these challenges, its p… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

  48. arXiv:2404.00658  [pdf, other

    cs.CV

    KTPFormer: Kinematics and Trajectory Prior Knowledge-Enhanced Transformer for 3D Human Pose Estimation

    Authors: Jihua Peng, Yanghong Zhou, P. Y. Mok

    Abstract: This paper presents a novel Kinematics and Trajectory Prior Knowledge-Enhanced Transformer (KTPFormer), which overcomes the weakness in existing transformer-based methods for 3D human pose estimation that the derivation of Q, K, V vectors in their self-attention mechanisms are all based on simple linear map**. We propose two prior attention modules, namely Kinematics Prior Attention (KPA) and Tr… ▽ More

    Submitted 2 April, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024,GitHub code:https://github.com/JihuaPeng/KTPFormer

  49. arXiv:2403.15698  [pdf, other

    cs.CV cs.AI

    SceneX:Procedural Controllable Large-scale Scene Generation via Large-language Models

    Authors: Mengqi Zhou, Jun Hou, Chuanchen Luo, Yuxi Wang, Zhaoxiang Zhang, Junran Peng

    Abstract: Due to its great application potential, large-scale scene generation has drawn extensive attention in academia and industry. Recent research employs powerful generative models to create desired scenes and achieves promising results. However, most of these methods represent the scene using 3D primitives (e.g. point cloud or radiance field) incompatible with the industrial pipeline, which leads to a… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  50. arXiv:2403.12787  [pdf, other

    cs.CV

    DDSB: An Unsupervised and Training-free Method for Phase Detection in Echocardiography

    Authors: Zhenyu Bu, Yang Liu, Jiayu Huo, **g**g Peng, Kaini Wang, Guangquan Zhou, Rachel Sparks, Prokar Dasgupta, Alejandro Granados, Sebastien Ourselin

    Abstract: Accurate identification of End-Diastolic (ED) and End-Systolic (ES) frames is key for cardiac function assessment through echocardiography. However, traditional methods face several limitations: they require extensive amounts of data, extensive annotations by medical experts, significant training resources, and often lack robustness. Addressing these challenges, we proposed an unsupervised and tra… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.