Skip to main content

Showing 1–50 of 242 results for author: Peng, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19804  [pdf, ps, other

    cs.IR

    Rateless Stochastic Coding for Delay-constrained Semantic Communication

    Authors: Cheng Peng, Rulong Wang, Yong Xiao

    Abstract: We consider the problem of joint source-channel coding with distortion and perception constraints from a rateless perspective, the purpose of which is to settle the balance between reliability (distortion/perception) and effectiveness (rate) of transmission over uncertain channels. We find a new finite-blocklength bound for the achievable joint source-channel code rate with the above two constrain… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  2. arXiv:2406.11145  [pdf, other

    cs.CV

    Federated Face Forgery Detection Learning with Personalized Representation

    Authors: Decheng Liu, Zhan Dang, Chunlei Peng, Nannan Wang, Ruimin Hu, Xinbo Gao

    Abstract: Deep generator technology can produce high-quality fake videos that are indistinguishable, posing a serious social threat. Traditional forgery detection methods directly centralized training on data and lacked consideration of information sharing in non-public video data scenarios and data privacy. Naturally, the federated learning strategy can be applied for privacy protection, which aggregates m… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: The code is publicly available

  3. arXiv:2406.10933  [pdf, other

    cs.CV

    Improving Adversarial Robustness via Decoupled Visual Representation Masking

    Authors: Decheng Liu, Tao Chen, Chunlei Peng, Nannan Wang, Ruimin Hu, Xinbo Gao

    Abstract: Deep neural networks are proven to be vulnerable to fine-designed adversarial examples, and adversarial defense algorithms draw more and more attention nowadays. Pre-processing based defense is a major strategy, as well as learning robust feature representation has been proven an effective way to boost generalization. However, existing defense works lack considering different depth-level visual fe… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: The code is publicly available

  4. arXiv:2406.10887  [pdf, other

    cs.CV

    Imperceptible Face Forgery Attack via Adversarial Semantic Mask

    Authors: Decheng Liu, Qixuan Su, Chunlei Peng, Nannan Wang, Xinbo Gao

    Abstract: With the great development of generative model techniques, face forgery detection draws more and more attention in the related field. Researchers find that existing face forgery models are still vulnerable to adversarial examples with generated pixel perturbations in the global image. These generated adversarial samples still can't achieve satisfactory performance because of the high detectability… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: The code is publicly available

  5. arXiv:2406.05988  [pdf, other

    cs.GT

    Sponsored Search Auction Design Beyond Single Utility Maximization

    Authors: Changfeng Xu, Chao Peng, Chenyang Xu, Zhengfeng Yang

    Abstract: Auction design for the modern advertising market has gained significant prominence in the field of game theory. With the recent rise of auto-bidding tools, an increasing number of advertisers in the market are utilizing these tools for auctions. The diverse array of auto-bidding tools has made auction design more challenging. Various types of bidders, such as quasi-linear utility maximizers and co… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: To appear in COCOON 2024

  6. arXiv:2405.18784  [pdf, other

    cs.CV

    LP-3DGS: Learning to Prune 3D Gaussian Splatting

    Authors: Zhaoliang Zhang, Tianchen Song, Yongjae Lee, Li Yang, Cheng Peng, Rama Chellappa, Deliang Fan

    Abstract: Recently, 3D Gaussian Splatting (3DGS) has become one of the mainstream methodologies for novel view synthesis (NVS) due to its high quality and fast rendering speed. However, as a point-based scene representation, 3DGS potentially generates a large number of Gaussians to fit the scene, leading to high memory usage. Improvements that have been proposed require either an empirical and preset prunin… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  7. arXiv:2405.05202  [pdf, other

    cs.DS cs.DM cs.LG

    Discretely Beyond $1/e$: Guided Combinatorial Algorithms for Submodular Maximization

    Authors: Yixin Chen, Ankur Nath, Chunli Peng, Alan Kuhnle

    Abstract: For constrained, not necessarily monotone submodular maximization, all known approximation algorithms with ratio greater than $1/e$ require continuous ideas, such as queries to the multilinear extension of a submodular function and its gradient, which are typically expensive to simulate with the original set function. For combinatorial algorithms, the best known approximation ratios for both size… ▽ More

    Submitted 22 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

  8. arXiv:2405.04111  [pdf, other

    cs.LG eess.SP

    Adaptive Least Mean pth Power Graph Neural Networks

    Authors: Changran Peng, Yi Yan, Ercan E. Kuruoglu

    Abstract: In the presence of impulsive noise, and missing observations, accurate online prediction of time-varying graph signals poses a crucial challenge in numerous application domains. We propose the Adaptive Least Mean $p^{th}$ Power Graph Neural Networks (LMP-GNN), a universal framework combining adaptive filter and graph neural network for online graph signal estimation. LMP-GNN retains the advantage… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  9. arXiv:2405.03372  [pdf, other

    cs.NI cs.AI

    Snake Learning: A Communication- and Computation-Efficient Distributed Learning Framework for 6G

    Authors: Xiaoxue Yu, Xingfu Yi, Rongpeng Li, Fei Wang, Chenghui Peng, Zhifeng Zhao, Honggang Zhang

    Abstract: In the evolution towards 6G, integrating Artificial Intelligence (AI) with advanced network infrastructure emerges as a pivotal strategy for enhancing network intelligence and resource utilization. Existing distributed learning frameworks like Federated Learning and Split Learning often struggle with significant challenges in dynamic network environments including high synchronization demands, cos… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 7 pages, 6 figures

  10. arXiv:2404.19358  [pdf, other

    cs.IT

    QML-IB: Quantized Collaborative Intelligence between Multiple Devices and the Mobile Network

    Authors: **gchen Peng, Boxiang Ren, Lu Yang, Chenghui Peng, Panpan Niu, Hao Wu

    Abstract: The integration of artificial intelligence (AI) and mobile networks is regarded as one of the most important scenarios for 6G. In 6G, a major objective is to realize the efficient transmission of task-relevant data. Then a key problem arises, how to design collaborative AI models for the device side and the network side, so that the transmitted data between the device and the network is efficient… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  11. arXiv:2404.16670  [pdf, other

    cs.CV cs.AI

    EmoVIT: Revolutionizing Emotion Insights with Visual Instruction Tuning

    Authors: Hongxia Xie, Chu-Jun Peng, Yu-Wen Tseng, Hung-Jen Chen, Chan-Feng Hsu, Hong-Han Shuai, Wen-Huang Cheng

    Abstract: Visual Instruction Tuning represents a novel learning paradigm involving the fine-tuning of pre-trained language models using task-specific instructions. This paradigm shows promising zero-shot results in various natural language processing tasks but is still unexplored in vision emotion understanding. In this work, we focus on enhancing the model's proficiency in understanding and adhering to ins… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  12. arXiv:2404.12753  [pdf, other

    cs.CL cs.AI

    AutoCrawler: A Progressive Understanding Web Agent for Web Crawler Generation

    Authors: Wenhao Huang, Chenghao Peng, Zhixu Li, Jiaqing Liang, Yanghua Xiao, Liqian Wen, Zulong Chen

    Abstract: Web automation is a significant technique that accomplishes complicated web tasks by automating common web actions, enhancing operational efficiency, and reducing the need for manual intervention. Traditional methods, such as wrappers, suffer from limited adaptability and scalability when faced with a new website. On the other hand, generative agents empowered by large language models (LLMs) exhib… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: 18 pages, 5 figures

  13. arXiv:2404.10584  [pdf, other

    cs.CV

    ReWiTe: Realistic Wide-angle and Telephoto Dual Camera Fusion Dataset via Beam Splitter Camera Rig

    Authors: Chunli Peng, Xuan Dong, Tiantian Cao, Zhengqing Li, Kun Dong, Weixin Li

    Abstract: The fusion of images from dual camera systems featuring a wide-angle and a telephoto camera has become a hotspot problem recently. By integrating simultaneously captured wide-angle and telephoto images from these systems, the resulting fused image achieves a wide field of view (FOV) coupled with high-definition quality. Existing approaches are mostly deep learning methods, and predominantly rely o… ▽ More

    Submitted 29 April, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

  14. arXiv:2404.09509  [pdf, other

    cs.CV

    Fuse after Align: Improving Face-Voice Association Learning via Multimodal Encoder

    Authors: Chong Peng, Liqiang He, Dan Su

    Abstract: Today, there have been many achievements in learning the association between voice and face. However, most previous work models rely on cosine similarity or L2 distance to evaluate the likeness of voices and faces following contrastive learning, subsequently applied to retrieval and matching tasks. This method only considers the embeddings as high-dimensional vectors, utilizing a minimal scope of… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  15. arXiv:2404.08361  [pdf, other

    cs.IR cs.AI

    Large-Scale Multi-Domain Recommendation: an Automatic Domain Feature Extraction and Personalized Integration Framework

    Authors: Dongbo Xi, Zhen Chen, Yuexian Wang, He Cui, Chong Peng, Fuzhen Zhuang, Peng Yan

    Abstract: Feed recommendation is currently the mainstream mode for many real-world applications (e.g., TikTok, Dian**), it is usually necessary to model and predict user interests in multiple scenarios (domains) within and even outside the application. Multi-domain learning is a typical solution in this regard. While considerable efforts have been made in this regard, there are still two long-standing cha… ▽ More

    Submitted 14 April, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

    Comments: 8 pages

  16. arXiv:2404.02655  [pdf, other

    cs.CL

    Calibrating the Confidence of Large Language Models by Eliciting Fidelity

    Authors: Mozhi Zhang, Mianqiu Huang, Rundong Shi, Linsen Guo, Chong Peng, Peng Yan, Yaqian Zhou, Xipeng Qiu

    Abstract: Large language models optimized with techniques like RLHF have achieved good alignment in being helpful and harmless. However, post-alignment, these language models often exhibit overconfidence, where the expressed confidence does not accurately calibrate with their correctness rate. In this paper, we decompose the language model confidence into the \textit{Uncertainty} about the question and the… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: 17 pages, 13 figures

  17. Multi-Level Label Correction by Distilling Proximate Patterns for Semi-supervised Semantic Segmentation

    Authors: Hui Xiao, Yuting Hong, Li Dong, Diqun Yan, Jiayan Zhuang, Junjie Xiong, Dongtai Liang, Chengbin Peng

    Abstract: Semi-supervised semantic segmentation relieves the reliance on large-scale labeled data by leveraging unlabeled data. Recent semi-supervised semantic segmentation approaches mainly resort to pseudo-labeling methods to exploit unlabeled data. However, unreliable pseudo-labeling can undermine the semi-supervision processes. In this paper, we propose an algorithm called Multi-Level Label Correction (… ▽ More

    Submitted 9 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: 12 pages, 8 figures. IEEE Transactions on Multimedia, 2024

  18. arXiv:2404.00726  [pdf, other

    eess.IV cs.CV cs.LG

    MugenNet: A Novel Combined Convolution Neural Network and Transformer Network with its Application for Colonic Polyp Image Segmentation

    Authors: Chen Peng, Zhiqin Qian, Kunyu Wang, Qi Luo, Zhuming Bi, Wenjun Zhang

    Abstract: Biomedical image segmentation is a very important part in disease diagnosis. The term "colonic polyps" refers to polypoid lesions that occur on the surface of the colonic mucosa within the intestinal lumen. In clinical practice, early detection of polyps is conducted through colonoscopy examinations and biomedical image processing. Therefore, the accurate polyp image segmentation is of great signi… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

  19. arXiv:2404.00237  [pdf, other

    cs.RO

    Joint Pedestrian Trajectory Prediction through Posterior Sampling

    Authors: Haotian Lin, Yixiao Wang, Mingxiao Huo, Chensheng Peng, Zhiyuan Liu, Masayoshi Tomizuka

    Abstract: Joint pedestrian trajectory prediction has long grappled with the inherent unpredictability of human behaviors. Recent investigations employing variants of conditional diffusion models in trajectory prediction have exhibited notable success. Nevertheless, the heavy dependence on accurate historical data results in their vulnerability to noise disturbances and data incompleteness. To improve the ro… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  20. arXiv:2403.18593  [pdf, other

    cs.CV cs.AI

    Homogeneous Tokenizer Matters: Homogeneous Visual Tokenizer for Remote Sensing Image Understanding

    Authors: Run Shao, Zhaoyang Zhang, Chao Tao, Yunsheng Zhang, Chengli Peng, Haifeng Li

    Abstract: The tokenizer, as one of the fundamental components of large models, has long been overlooked or even misunderstood in visual tasks. One key factor of the great comprehension power of the large language model is that natural language tokenizers utilize meaningful words or subwords as the basic elements of language. In contrast, mainstream visual tokenizers, represented by patch-based methods such… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: 20 pages, 8 figures, 6 tables

  21. arXiv:2403.18274  [pdf, other

    cs.CV

    DVLO: Deep Visual-LiDAR Odometry with Local-to-Global Feature Fusion and Bi-Directional Structure Alignment

    Authors: Jiuming Liu, Dong Zhuo, Zhiheng Feng, Siting Zhu, Chensheng Peng, Zhe Liu, Hesheng Wang

    Abstract: Information inside visual and LiDAR data is well complementary derived from the fine-grained texture of images and massive geometric information in point clouds. However, it remains challenging to explore effective visual-LiDAR fusion, mainly due to the intrinsic data structure inconsistency between two modalities: Images are regular and dense, but LiDAR points are unordered and sparse. To address… ▽ More

    Submitted 1 July, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

    Comments: Accepted by ECCV2024.Code will be released at https://github.com/IRMVLab/DVLO

  22. arXiv:2403.17347  [pdf, other

    cs.RO

    Unified Path and Gait Planning for Safe Bipedal Robot Navigation

    Authors: Chengyang Peng, Victor Paredes, Ayonga Hereid

    Abstract: Safe path and gait planning are essential for bipedal robots to navigate complex real-world environments. The prevailing approaches often plan the path and gait separately in a hierarchical fashion, potentially resulting in unsafe movements due to neglecting the physical constraints of walking robots. A safety-critical path must not only avoid obstacles but also ensure that the robot's gaits are s… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: 8 pages

  23. PNAS-MOT: Multi-Modal Object Tracking with Pareto Neural Architecture Search

    Authors: Chensheng Peng, Zhaoyu Zeng, **ling Gao, Jundong Zhou, Masayoshi Tomizuka, Xinbing Wang, Chenghu Zhou, Nanyang Ye

    Abstract: Multiple object tracking is a critical task in autonomous driving. Existing works primarily focus on the heuristic design of neural networks to obtain high accuracy. As tracking accuracy improves, however, neural networks become increasingly complex, posing challenges for their practical application in real driving scenarios due to the high level of latency. In this paper, we explore the use of th… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

    Comments: IEEE Robotics and Automation Letters 2024. Code is available at https://github.com/PholyPeng/PNAS-MOT

    Journal ref: IEEE Robotics and Automation Letters, 2024

  24. arXiv:2403.13089  [pdf

    cs.CL

    Automatic Summarization of Doctor-Patient Encounter Dialogues Using Large Language Model through Prompt Tuning

    Authors: Mengxian Lyu, Cheng Peng, Xiaohan Li, Patrick Balian, Jiang Bian, Yonghui Wu

    Abstract: Automatic text summarization (ATS) is an emerging technology to assist clinicians in providing continuous and coordinated care. This study presents an approach to summarize doctor-patient dialogues using generative large language models (LLMs). We developed prompt-tuning algorithms to instruct generative LLMs to summarize clinical text. We examined the prompt-tuning strategies, the size of soft pr… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  25. arXiv:2403.12374  [pdf

    cs.CL

    Improving Generalizability of Extracting Social Determinants of Health Using Large Language Models through Prompt-tuning

    Authors: Cheng Peng, Zehao Yu, Kaleb E Smith, Wei-Hsuan Lo-Ciganic, Jiang Bian, Yonghui Wu

    Abstract: The progress in natural language processing (NLP) using large language models (LLMs) has greatly improved patient information extraction from clinical narratives. However, most methods based on the fine-tuning strategy have limited transfer learning ability for cross-domain applications. This study proposed a novel approach that employs a soft prompt-based learning architecture, which introduces t… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  26. arXiv:2403.10550  [pdf, ps, other

    cs.LG cs.AI cs.CR

    Semi-Supervised Learning for Anomaly Traffic Detection via Bidirectional Normalizing Flows

    Authors: Zhangxuan Dang, Yu Zheng, Xinglin Lin, Chunlei Peng, Qiuyu Chen, Xinbo Gao

    Abstract: With the rapid development of the Internet, various types of anomaly traffic are threatening network security. We consider the problem of anomaly network traffic detection and propose a three-stage anomaly detection framework using only normal traffic. Our framework can generate pseudo anomaly samples without prior knowledge of anomalies to achieve the detection of anomaly data. Firstly, we employ… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  27. arXiv:2403.08604  [pdf, other

    cs.CL cs.SE

    DevBench: A Comprehensive Benchmark for Software Development

    Authors: Bowen Li, Wenhan Wu, Ziwei Tang, Lin Shi, John Yang, **yang Li, Shunyu Yao, Chen Qian, Binyuan Hui, Qicheng Zhang, Zhiyin Yu, He Du, ** Yang, Dahua Lin, Chao Peng, Kai Chen

    Abstract: Recent advancements in large language models (LLMs) have significantly enhanced their coding capabilities. However, existing benchmarks predominantly focused on simplified or isolated aspects of programming, such as single-file code generation or repository issue debugging, falling short of measuring the full spectrum of challenges raised by real-world programming activities. To this end, we propo… ▽ More

    Submitted 15 March, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

    Comments: Our data and code are available at https://github.com/open-compass/DevBench

  28. arXiv:2403.08125  [pdf, other

    cs.CV

    Q-SLAM: Quadric Representations for Monocular SLAM

    Authors: Chensheng Peng, Chenfeng Xu, Yue Wang, Mingyu Ding, Heng Yang, Masayoshi Tomizuka, Kurt Keutzer, Marco Pavone, Wei Zhan

    Abstract: Monocular SLAM has long grappled with the challenge of accurately modeling 3D geometries. Recent advances in Neural Radiance Fields (NeRF)-based monocular SLAM have shown promise, yet these methods typically focus on novel view synthesis rather than precise 3D geometry modeling. This focus results in a significant disconnect between NeRF applications, i.e., novel-view synthesis and the requirement… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  29. arXiv:2403.04926  [pdf, other

    cs.CV

    BAGS: Blur Agnostic Gaussian Splatting through Multi-Scale Kernel Modeling

    Authors: Cheng Peng, Yutao Tang, Yifan Zhou, Nengyu Wang, Xijun Liu, Deming Li, Rama Chellappa

    Abstract: Recent efforts in using 3D Gaussians for scene reconstruction and novel view synthesis can achieve impressive results on curated benchmarks; however, images captured in real life are often blurry. In this work, we analyze the robustness of Gaussian-Splatting-based methods against various image blur, such as motion blur, defocus blur, downscaling blur, \etc. Under these degradations, Gaussian-Splat… ▽ More

    Submitted 24 March, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

  30. Pyramid Feature Attention Network for Monocular Depth Prediction

    Authors: Yifang Xu, Chenglei Peng, Ming Li, Yang Li, Sidan Du

    Abstract: Deep convolutional neural networks (DCNNs) have achieved great success in monocular depth estimation (MDE). However, few existing works take the contributions for MDE of different levels feature maps into account, leading to inaccurate spatial layout, ambiguous boundaries and discontinuous object surface in the prediction. To better tackle these problems, we propose a Pyramid Feature Attention Net… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

    Comments: 6 pages, 5 figures

  31. arXiv:2402.16242  [pdf, other

    cs.CV cs.AI

    HSONet:A Siamese foreground association-driven hard case sample optimization network for high-resolution remote sensing image change detection

    Authors: Chao Tao, Dongsheng Kuang, Zhenyang Huang, Chengli Peng, Haifeng Li

    Abstract: In the later training stages, further improvement of the models ability to determine changes relies on how well the change detection (CD) model learns hard cases; however, there are two additional challenges to learning hard case samples: (1) change labels are limited and tend to pointer only to foreground targets, yet hard case samples are prevalent in the background, which leads to optimizing th… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

    Comments: 17 figures, 8 tables, 18 pages

  32. arXiv:2402.13349  [pdf, other

    cs.CV cs.AI cs.HC

    Aria Everyday Activities Dataset

    Authors: Zhaoyang Lv, Nicholas Charron, Pierre Moulon, Alexander Gamino, Cheng Peng, Chris Sweeney, Edward Miller, Huixuan Tang, Jeff Meissner, **g Dong, Kiran Somasundaram, Luis Pesqueira, Mark Schwesinger, Omkar Parkhi, Qiao Gu, Renzo De Nardi, Shangyi Cheng, Steve Saarinen, Vijay Baiyya, Yuyang Zou, Richard Newcombe, Jakob Julian Engel, Xiaqing Pan, Carl Ren

    Abstract: We present Aria Everyday Activities (AEA) Dataset, an egocentric multimodal open dataset recorded using Project Aria glasses. AEA contains 143 daily activity sequences recorded by multiple wearers in five geographically diverse indoor locations. Each of the recording contains multimodal sensor data recorded through the Project Aria glasses. In addition, AEA provides machine perception data includi… ▽ More

    Submitted 21 February, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: Dataset website: https://www.projectaria.com/datasets/aea/

  33. arXiv:2402.12749  [pdf

    cs.CL cs.AI

    Me LLaMA: Foundation Large Language Models for Medical Applications

    Authors: Qianqian Xie, Qingyu Chen, Aokun Chen, Cheng Peng, Yan Hu, Fongci Lin, Xueqing Peng, Jimin Huang, Jeffrey Zhang, Vipina Keloth, Xinyu Zhou, Huan He, Lucila Ohno-Machado, Yonghui Wu, Hua Xu, Jiang Bian

    Abstract: Recent advancements in large language models (LLMs) such as ChatGPT and LLaMA have hinted at their potential to revolutionize medical applications, yet their application in clinical settings often reveals limitations due to a lack of specialized training on medical-specific data. In response to this challenge, this study introduces Me-LLaMA, a novel medical LLM family that includes foundation mode… ▽ More

    Submitted 11 April, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: 21 pages, 3 figures, 8 tables

  34. arXiv:2402.05322  [pdf, other

    cs.LG cs.AI cs.GR cs.SI

    Learning on Multimodal Graphs: A Survey

    Authors: Ciyuan Peng, Jiayuan He, Feng Xia

    Abstract: Multimodal data pervades various domains, including healthcare, social media, and transportation, where multimodal graphs play a pivotal role. Machine learning on multimodal graphs, referred to as multimodal graph learning (MGL), is essential for successful artificial intelligence (AI) applications. The burgeoning research in this field encompasses diverse graph data types and modalities, learning… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: 9 pages, 1 figure

  35. arXiv:2401.15304  [pdf, other

    cs.LG eess.SP

    Adaptive Least Mean Squares Graph Neural Networks and Online Graph Signal Estimation

    Authors: Yi Yan, Changran Peng, Ercan Engin Kuruoglu

    Abstract: The online prediction of multivariate signals, existing simultaneously in space and time, from noisy partial observations is a fundamental task in numerous applications. We propose an efficient Neural Network architecture for the online estimation of time-varying graph signals named the Adaptive Least Mean Squares Graph Neural Networks (LMS-GNN). LMS-GNN aims to capture the time variation and brid… ▽ More

    Submitted 27 January, 2024; originally announced January 2024.

  36. arXiv:2401.11705  [pdf, other

    cs.IR cs.AI

    Domain-Aware Cross-Attention for Cross-domain Recommendation

    Authors: Yuhao Luo, Shiwei Ma, Mingjun Nie, Chang** Peng, Zhangang Lin, **g** Shao, Qianfang Xu

    Abstract: Cross-domain recommendation (CDR) is an important method to improve recommender system performance, especially when observations in target domains are sparse. However, most existing cross-domain recommendations fail to fully utilize the target domain's special features and are hard to be generalized to new domains. The designed network is complex and is not suitable for rapid industrial deployment… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: 6 pages, 1 figure

  37. Core-periphery Detection Based on Masked Bayesian Non-negative Matrix Factorization

    Authors: Zhonghao Wang, Ru Yuan, Jiaye Fu, Ka-Chun Wong, Chengbin Peng

    Abstract: Core-periphery structure is an essential mesoscale feature in complex networks. Previous researches mostly focus on discriminative approaches while in this work, we propose a generative model called masked Bayesian non-negative matrix factorization. We build the model using two pair affiliation matrices to indicate core-periphery pair associations and using a mask matrix to highlight connections t… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: 12 pages, 11 figures. IEEE Transactions on Computational Social Systems(TCSS), 2024, early access

    Journal ref: IEEE Transactions on Computational Social Systems

  38. arXiv:2401.05646  [pdf, other

    cs.CV

    Masked Attribute Description Embedding for Cloth-Changing Person Re-identification

    Authors: Chunlei Peng, Boyu Wang, Decheng Liu, Nannan Wang, Ruimin Hu, Xinbo Gao

    Abstract: Cloth-changing person re-identification (CC-ReID) aims to match persons who change clothes over long periods. The key challenge in CC-ReID is to extract clothing-independent features, such as face, hairstyle, body shape, and gait. Current research mainly focuses on modeling body shape using multi-modal biological features (such as silhouettes and sketches). However, it does not fully leverage the… ▽ More

    Submitted 2 July, 2024; v1 submitted 10 January, 2024; originally announced January 2024.

  39. arXiv:2312.15903  [pdf, other

    cs.IR

    An Incremental Update Framework for Online Recommenders with Data-Driven Prior

    Authors: Chen Yang, ** Chen, Qian Yu, Xiangdong Wu, Kui Ma, Zihao Zhao, Zhiwei Fang, Wenlong Chen, Chaosheng Fan, Jie He, Chang** Peng, Zhangang Lin, **g** Shao

    Abstract: Online recommenders have attained growing interest and created great revenue for businesses. Given numerous users and items, incremental update becomes a mainstream paradigm for learning large-scale models in industrial scenarios, where only newly arrived data within a sliding window is fed into the model, meeting the strict requirements of quick response. However, this strategy would be prone to… ▽ More

    Submitted 26 December, 2023; originally announced December 2023.

  40. arXiv:2312.12750  [pdf, other

    cs.IR cs.AI

    Parallel Ranking of Ads and Creatives in Real-Time Advertising Systems

    Authors: Zhiguang Yang, Lu Wang, Chun Gan, Liufang Sang, Haoran Wang, Wenlong Chen, Jie He, Chang** Peng, Zhangang Lin, **g** Shao

    Abstract: "Creativity is the heart and soul of advertising services". Effective creatives can create a win-win scenario: advertisers can reach target users and achieve marketing objectives more effectively, users can more quickly find products of interest, and platforms can generate more advertising revenue. With the advent of AI-Generated Content, advertisers now can produce vast amounts of creative conten… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: 9 pages, 4 figures, AAAI2024

  41. arXiv:2312.11285  [pdf, other

    cs.CV cs.AI

    Adv-Diffusion: Imperceptible Adversarial Face Identity Attack via Latent Diffusion Model

    Authors: Decheng Liu, Xijun Wang, Chunlei Peng, Nannan Wang, Ruiming Hu, Xinbo Gao

    Abstract: Adversarial attacks involve adding perturbations to the source image to cause misclassification by the target model, which demonstrates the potential of attacking face recognition models. Existing adversarial face image generation methods still can't achieve satisfactory performance because of low transferability and high detectability. In this paper, we propose a unified framework Adv-Diffusion t… ▽ More

    Submitted 28 December, 2023; v1 submitted 18 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI 2024

  42. arXiv:2312.11184  [pdf, other

    cs.CV

    View Transition based Dual Camera Image Fusion

    Authors: Tiantian Cao, Xuan Dong, Chunli Peng, Zhengqing Li, Xinyu Guo, Weixin Li

    Abstract: The dual camera system of wide-angle ($\bf{W}$) and telephoto ($\bf{T}$) cameras has been widely adopted by popular phones. In the overlap region, fusing the $\bf{W}$ and $\bf{T}$ images can generate a higher quality image. Related works perform pixel-level motion alignment or high-dimensional feature alignment of the $\bf{T}$ image to the view of the $\bf{W}$ image and then perform image/feature… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

  43. arXiv:2312.10987  [pdf, other

    cs.CL

    Cross-Subject Data Splitting for Brain-to-Text Decoding

    Authors: Congchi Yin, Qian Yu, Zhiwei Fang, Jie He, Chang** Peng, Zhangang Lin, **g** Shao, Piji Li

    Abstract: Recent major milestones have successfully decoded non-invasive brain signals (e.g. functional Magnetic Resonance Imaging (fMRI) and electroencephalogram (EEG)) into natural language. Despite the progress in model design, how to split the datasets for training, validating, and testing still remains a matter of debate. Most of the prior researches applied subject-specific data splitting, where the d… ▽ More

    Submitted 14 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  44. arXiv:2312.10320  [pdf, other

    cs.CV

    Symmetrical Bidirectional Knowledge Alignment for Zero-Shot Sketch-Based Image Retrieval

    Authors: Decheng Liu, Xu Luo, Chunlei Peng, Nannan Wang, Ruimin Hu, Xinbo Gao

    Abstract: This paper studies the problem of zero-shot sketch-based image retrieval (ZS-SBIR), which aims to use sketches from unseen categories as queries to match the images of the same category. Due to the large cross-modality discrepancy, ZS-SBIR is still a challenging task and mimics realistic zero-shot scenarios. The key is to leverage transferable knowledge from the pre-trained model to improve genera… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

  45. arXiv:2312.10201  [pdf, other

    cs.MM cs.AI

    CARAT: Contrastive Feature Reconstruction and Aggregation for Multi-Modal Multi-Label Emotion Recognition

    Authors: Cheng Peng, Ke Chen, Lidan Shou, Gang Chen

    Abstract: Multi-modal multi-label emotion recognition (MMER) aims to identify relevant emotions from multiple modalities. The challenge of MMER is how to effectively capture discriminative features for multiple labels from heterogeneous data. Recent studies are mainly devoted to exploring various fusion strategies to integrate multi-modal information into a unified representation for all labels. However, su… ▽ More

    Submitted 13 January, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

  46. arXiv:2312.06099  [pdf

    cs.CL

    Generative Large Language Models Are All-purpose Text Analytics Engines: Text-to-text Learning Is All Your Need

    Authors: Cheng Peng, Xi Yang, Aokun Chen, Zehao Yu, Kaleb E Smith, Anthony B Costa, Mona G Flores, Jiang Bian, Yonghui Wu

    Abstract: Objective To solve major clinical natural language processing (NLP) tasks using a unified text-to-text learning architecture based on a generative large language model (LLM) via prompt tuning. Methods We formulated 7 key clinical NLP tasks as text-to-text learning and solved them using one unified generative clinical LLM, GatorTronGPT, developed using GPT-3 architecture and trained with up to 20 b… ▽ More

    Submitted 10 December, 2023; originally announced December 2023.

  47. arXiv:2312.04961  [pdf, other

    cs.CV

    DeepFidelity: Perceptual Forgery Fidelity Assessment for Deepfake Detection

    Authors: Chunlei Peng, Huiqing Guo, Decheng Liu, Nannan Wang, Ruimin Hu, Xinbo Gao

    Abstract: Deepfake detection refers to detecting artificially generated or edited faces in images or videos, which plays an essential role in visual information security. Despite promising progress in recent years, Deepfake detection remains a challenging problem due to the complexity and variability of face forgery techniques. Existing Deepfake detection methods are often devoted to extracting features by… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

  48. arXiv:2312.01546  [pdf, other

    cs.IT eess.SP

    Learning Channel Capacity with Neural Mutual Information Estimator Based on Message Importance Measure

    Authors: Zhefan Li, Rui She, **yi Fan, Chenghui Peng, Khaled B. Letaief

    Abstract: Channel capacity estimation plays a crucial role in beyond 5G intelligent communications. Despite its significance, this task is challenging for a majority of channels, especially for the complex channels not modeled as the well-known typical ones. Recently, neural networks have been used in mutual information estimation and optimization. They are particularly considered as efficient tools for lea… ▽ More

    Submitted 3 December, 2023; originally announced December 2023.

    Comments: 31 pages, 5 figures

  49. arXiv:2311.17074  [pdf, other

    cs.CV

    Self-Supervised Learning of Whole and Component-Based Semantic Representations for Person Re-Identification

    Authors: Siyuan Huang, Yifan Zhou, Ram Prabhakar, Xijun Liu, Yuxiang Guo, Hongrui Yi, Cheng Peng, Rama Chellappa, Chun Pong Lau

    Abstract: Person Re-Identification (ReID) is a challenging problem, focusing on identifying individuals across diverse settings. However, previous ReID methods primarily concentrated on a single domain or modality, such as Clothes-Changing ReID (CC-ReID) and video ReID. Real-world ReID is not constrained by factors like clothes or input types. Recent approaches emphasize on learning semantics through pre-tr… ▽ More

    Submitted 14 March, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

  50. arXiv:2311.16497  [pdf, other

    cs.CV

    GaitContour: Efficient Gait Recognition based on a Contour-Pose Representation

    Authors: Yuxiang Guo, Anshul Shah, Jiang Liu, Ayush Gupta, Rama Chellappa, Cheng Peng

    Abstract: Gait recognition holds the promise to robustly identify subjects based on walking patterns instead of appearance information. In recent years, this field has been dominated by learning methods based on two principal input representations: dense silhouette masks or sparse pose keypoints. In this work, we propose a novel, point-based Contour-Pose representation, which compactly expresses both body s… ▽ More

    Submitted 14 March, 2024; v1 submitted 27 November, 2023; originally announced November 2023.