Skip to main content

Showing 1–50 of 322 results for author: Luo, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00955  [pdf, other

    cs.IT cs.AI eess.SP

    Task-oriented Over-the-air Computation for Edge-device Co-inference with Balanced Classification Accuracy

    Authors: Xiang Jiao, Dingzhu Wen, Guangxu Zhu, Wei Jiang, Wu Luo, Yuanming Shi

    Abstract: Edge-device co-inference, which concerns the cooperation between edge devices and an edge server for completing inference tasks over wireless networks, has been a promising technique for enabling various kinds of intelligent services at the network edge, e.g., auto-driving. In this paradigm, the concerned design objective of the network shifts from the traditional communication throughput to the e… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: This paper was accepted by IEEE Transactions on Vehicular Technology on June 30, 2024

  2. arXiv:2406.14548  [pdf, other

    cs.LG cs.CV

    Consistency Models Made Easy

    Authors: Zhengyang Geng, Ashwini Pokle, William Luo, Justin Lin, J. Zico Kolter

    Abstract: Consistency models (CMs) are an emerging class of generative models that offer faster sampling than traditional diffusion models. CMs enforce that all points along a sampling trajectory are mapped to the same initial point. But this target leads to resource-intensive training: for example, as of 2024, training a SoTA CM on CIFAR-10 takes one week on 8 GPUs. In this work, we propose an alternative… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  3. arXiv:2406.13007  [pdf, other

    cs.CV

    NTIRE 2024 Challenge on Night Photography Rendering

    Authors: Egor Ershov, Artyom Panshin, Oleg Karasev, Sergey Korchagin, Shepelev Lev, Alexandr Startsev, Daniil Vladimirov, Ekaterina Zaychenkova, Nikola Banić, Dmitrii Iarchuk, Maria Efimova, Radu Timofte, Arseniy Terekhin, Shuwei Yue, Yuyang Liu, Minchen Wei, Lu Xu, Chao Zhang, Yasi Wang, Furkan Kınlı, Doğa Yılmaz, Barış Özcan, Furkan Kıraç, Shuai Liu, **gyuan Xiao , et al. (25 additional authors not shown)

    Abstract: This paper presents a review of the NTIRE 2024 challenge on night photography rendering. The goal of the challenge was to find solutions that process raw camera images taken in nighttime conditions, and thereby produce a photo-quality output images in the standard RGB (sRGB) space. Unlike the previous year's competition, the challenge images were collected with a mobile phone and the speed of algo… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 10 pages, 10 figures

  4. arXiv:2406.12802  [pdf, other

    cs.RO

    Decentralized Multi-Robot Line-of-Sight Connectivity Maintenance under Uncertainty

    Authors: Yupeng Yang, Yiwei Lyu, Yanze Zhang, Sha Yi, Wenhao Luo

    Abstract: In this paper, we propose a novel decentralized control method to maintain Line-of-Sight connectivity for multi-robot networks in the presence of Guassian-distributed localization uncertainty. In contrast to most existing work that assumes perfect positional information about robots or enforces overly restrictive rigid formation against uncertainty, our method enables robots to preserve Line-of-Si… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Accepted by RSS 2024

  5. arXiv:2406.11507  [pdf, other

    cs.CV

    Prior Normality Prompt Transformer for Multi-class Industrial Image Anomaly Detection

    Authors: Haiming Yao, Yunkang Cao, Wei Luo, Weihang Zhang, Wenyong Yu, Weiming Shen

    Abstract: Image anomaly detection plays a pivotal role in industrial inspection. Traditional approaches often demand distinct models for specific categories, resulting in substantial deployment costs. This raises concerns about multi-class anomaly detection, where a unified model is developed for multiple classes. However, applying conventional methods, particularly reconstruction-based models, directly to… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted by IEEE Transactions on Industrial Informatics

  6. arXiv:2406.10985  [pdf, other

    cs.CL

    Taking a Deep Breath: Enhancing Language Modeling of Large Language Models with Sentinel Tokens

    Authors: Weiyao Luo, Suncong Zheng, Heming Xia, Weikang Wang, Yan Lei, Tianyu Liu, Shuang Chen, Zhifang Sui

    Abstract: Large language models (LLMs) have shown promising efficacy across various tasks, becoming powerful tools in numerous aspects of human life. However, Transformer-based LLMs suffer a performance degradation when modeling long-term contexts due to they discard some information to reduce computational overhead. In this work, we propose a simple yet effective method to enable LLMs to take a deep breath… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  7. arXiv:2406.07648  [pdf, other

    cs.CV

    M-LRM: Multi-view Large Reconstruction Model

    Authors: Mengfei Li, Xiaoxiao Long, Yixun Liang, Weiyu Li, Yuan Liu, Peng Li, Xiaowei Chi, Xingqun Qi, Wei Xue, Wenhan Luo, Qifeng Liu, Yike Guo

    Abstract: Despite recent advancements in the Large Reconstruction Model (LRM) demonstrating impressive results, when extending its input from single image to multiple images, it exhibits inefficiencies, subpar geometric and texture quality, as well as slower convergence speed than expected. It is attributed to that, LRM formulates 3D reconstruction as a naive images-to-3D translation problem, ignoring the… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  8. arXiv:2406.07333  [pdf, other

    cs.CV

    Global-Regularized Neighborhood Regression for Efficient Zero-Shot Texture Anomaly Detection

    Authors: Haiming Yao, Wei Luo, Yunkang Cao, Yiheng Zhang, Wenyong Yu, Weiming Shen

    Abstract: Texture surface anomaly detection finds widespread applications in industrial settings. However, existing methods often necessitate gathering numerous samples for model training. Moreover, they predominantly operate within a close-set detection framework, limiting their ability to identify anomalies beyond the training dataset. To tackle these challenges, this paper introduces a novel zero-shot te… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: SUBMISSION TO IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS

  9. arXiv:2406.07115  [pdf, other

    cs.CL cs.AI cs.LG

    Advancing Tool-Augmented Large Language Models: Integrating Insights from Errors in Inference Trees

    Authors: Sijia Chen, Yibo Wang, Yi-Feng Wu, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, Lijun Zhang

    Abstract: Tool-augmented large language models (LLMs) leverage tools, often in the form of APIs, to enhance their reasoning capabilities on complex tasks, thus taking on the role of intelligent agents interacting with the real world. The recently introduced ToolLLaMA model by Qin et al. [2024] utilizes the depth-first search-based decision tree (DFSDT) method for reasoning with $16000+$ real-world APIs, whi… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  10. arXiv:2406.07070  [pdf, other

    cs.CL

    HalluDial: A Large-Scale Benchmark for Automatic Dialogue-Level Hallucination Evaluation

    Authors: Wen Luo, Tianshu Shen, Wei Li, Guangyue Peng, Richeng Xuan, Houfeng Wang, Xi Yang

    Abstract: Large Language Models (LLMs) have significantly advanced the field of Natural Language Processing (NLP), achieving remarkable performance across diverse tasks and enabling widespread real-world applications. However, LLMs are prone to hallucination, generating content that either conflicts with established knowledge or is unfaithful to the original sources. Existing hallucination benchmarks primar… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  11. arXiv:2406.07006  [pdf, other

    cs.CV

    MIPI 2024 Challenge on Few-shot RAW Image Denoising: Methods and Results

    Authors: Xin **, Chunle Guo, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Ruoqi Li, Chang Liu, Ziyi Wang, Yao Du, **g**g Yang, Long Bao, Heng Sun, Xiangyu Kong, Xiaoxia Xing, **long Wu, Yuanyang Xue, Hyunhee Park, Sejun Song, Changho Kim, **gfan Tan , et al. (17 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: CVPR 2024 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Few-shot RAWImage Denoising Challenge Report. Website: https://mipi-challenge.org/MIPI2024/

  12. arXiv:2406.05343  [pdf, other

    cs.AI cs.CL

    M3GIA: A Cognition Inspired Multilingual and Multimodal General Intelligence Ability Benchmark

    Authors: Wei Song, Yadong Li, Jianhua Xu, Guowei Wu, Lingfeng Ming, Kexin Yi, Weihua Luo, Houyi Li, Yi Du, Fangda Guo, Kaicheng Yu

    Abstract: As recent multi-modality large language models (MLLMs) have shown formidable proficiency on various complex tasks, there has been increasing attention on debating whether these models could eventually mirror human intelligence. However, existing benchmarks mainly focus on evaluating solely on task performance, such as the accuracy of identifying the attribute of an object. Combining well-developed… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

  13. arXiv:2406.03496  [pdf, other

    cs.CL cs.AI cs.LG

    Wings: Learning Multimodal LLMs without Text-only Forgetting

    Authors: Yi-Kai Zhang, Shiyin Lu, Yang Li, Yanqing Ma, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, De-Chuan Zhan, Han-Jia Ye

    Abstract: Multimodal large language models (MLLMs), initiated with a trained LLM, first align images with text and then fine-tune on multimodal mixed inputs. However, the MLLM catastrophically forgets the text-only instructions, which do not include images and can be addressed within the initial LLM. In this paper, we present Wings, a novel MLLM that excels in both text-only dialogues and multimodal compreh… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  14. arXiv:2406.03035  [pdf, other

    cs.CV

    Follow-Your-Pose v2: Multiple-Condition Guided Character Image Animation for Stable Pose Control

    Authors: **gyun Xue, Hongfa Wang, Qi Tian, Yue Ma, Andong Wang, Zhiyuan Zhao, Shaobo Min, Wenzhe Zhao, Kaihao Zhang, Heung-Yeung Shum, Wei Liu, Mengyang Liu, Wenhan Luo

    Abstract: Pose-controllable character video generation is in high demand with extensive applications for fields such as automatic advertising and content creation on social media platforms. While existing character image animation methods using pose sequences and reference images have shown promising performance, they tend to struggle with incoherent animation in complex scenarios, such as multiple characte… ▽ More

    Submitted 12 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

  15. arXiv:2406.02539  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Parrot: Multilingual Visual Instruction Tuning

    Authors: Hai-Long Sun, Da-Wei Zhou, Yang Li, Shiyin Lu, Chao Yi, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, De-Chuan Zhan, Han-Jia Ye

    Abstract: The rapid development of Multimodal Large Language Models (MLLMs) like GPT-4V has marked a significant step towards artificial general intelligence. Existing methods mainly focus on aligning vision encoders with LLMs through supervised fine-tuning (SFT) to endow LLMs with multimodal abilities, making MLLMs' inherent ability to react to multiple languages progressively deteriorate as the training p… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  16. arXiv:2405.20797  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Ovis: Structural Embedding Alignment for Multimodal Large Language Model

    Authors: Shiyin Lu, Yang Li, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, Han-Jia Ye

    Abstract: Current Multimodal Large Language Models (MLLMs) typically integrate a pre-trained LLM with another pre-trained vision transformer through a connector, such as an MLP, endowing the LLM with visual capabilities. However, the misalignment between two embedding strategies in MLLMs -- the structural textual embeddings based on an embedding look-up table and the continuous embeddings generated directly… ▽ More

    Submitted 17 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

  17. arXiv:2405.20773  [pdf, other

    cs.CR cs.AI

    Visual-RolePlay: Universal Jailbreak Attack on MultiModal Large Language Models via Role-playing Image Character

    Authors: Siyuan Ma, Weidi Luo, Yu Wang, Xiaogeng Liu

    Abstract: With the advent and widespread deployment of Multimodal Large Language Models (MLLMs), ensuring their safety has become increasingly critical. To achieve this objective, it requires us to proactively discover the vulnerability of MLLMs by exploring the attack methods. Thus, structure-based jailbreak attacks, where harmful semantic content is embedded within images, have been proposed to mislead th… ▽ More

    Submitted 12 June, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

  18. arXiv:2405.16874  [pdf, other

    cs.CV

    CoCoGesture: Toward Coherent Co-speech 3D Gesture Generation in the Wild

    Authors: Xingqun Qi, Hengyuan Zhang, Yatian Wang, Jiahao Pan, Chen Liu, Peng Li, Xiaowei Chi, Mengfei Li, Qixun Zhang, Wei Xue, Shanghang Zhang, Wenhan Luo, Qifeng Liu, Yike Guo

    Abstract: Deriving co-speech 3D gestures has seen tremendous progress in virtual avatar animation. Yet, the existing methods often produce stiff and unreasonable gestures with unseen human speech inputs due to the limited 3D speech-gesture data. In this paper, we propose CoCoGesture, a novel framework enabling vivid and diverse gesture synthesis from unseen human speech prompts. Our key insight is built upo… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: The dataset will be released as soon as possible

  19. arXiv:2405.15831  [pdf, other

    eess.SY cs.AI cs.LG

    Transmission Interface Power Flow Adjustment: A Deep Reinforcement Learning Approach based on Multi-task Attribution Map

    Authors: Shunyu Liu, Wei Luo, Yanzhen Zhou, Kaixuan Chen, Quan Zhang, Huating Xu, Qinglai Guo, Mingli Song

    Abstract: Transmission interface power flow adjustment is a critical measure to ensure the security and economy operation of power systems. However, conventional model-based adjustment schemes are limited by the increasing variations and uncertainties occur in power systems, where the adjustment problems of different transmission interfaces are often treated as several independent tasks, ignoring their coup… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: Accepted by IEEE Transactions on Power Systems

  20. arXiv:2405.13152  [pdf, other

    cs.CV cs.AI

    Enhancing Interaction Modeling with Agent Selection and Physical Methods for Trajectory Prediction

    Authors: Shiji Huang, Lei Ye, Min Chen, Wenhai Luo, Chenqi Xu, Deyuan Liang, Dihong Wang

    Abstract: In this study, we address the limitations inherent in most existing vehicle trajectory prediction methodologies that indiscriminately incorporate all agents within a predetermined proximity when accounting for inter-agent interactions. These approaches commonly employ attention-based architecture or graph neural networks for encoding interactions, which introduces three challenges: (i) The indiscr… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: code:https://github.com/kkk00714/ASPILin

  21. arXiv:2405.12661  [pdf, other

    cs.CV

    EmoEdit: Evoking Emotions through Image Manipulation

    Authors: **gyuan Yang, Jiawei Feng, Weibin Luo, Dani Lischinski, Daniel Cohen-Or, Hui Huang

    Abstract: Affective Image Manipulation (AIM) seeks to modify user-provided images to evoke specific emotional responses. This task is inherently complex due to its twofold objective: significantly evoking the intended emotion, while preserving the original image composition. Existing AIM methods primarily adjust color and style, often failing to elicit precise and profound emotional shifts. Drawing on psych… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  22. arXiv:2405.11616  [pdf, other

    cs.CV

    Era3D: High-Resolution Multiview Diffusion using Efficient Row-wise Attention

    Authors: Peng Li, Yuan Liu, Xiaoxiao Long, Feihu Zhang, Cheng Lin, Mengfei Li, Xingqun Qi, Shanghang Zhang, Wenhan Luo, ** Tan, Wen** Wang, Qifeng Liu, Yike Guo

    Abstract: In this paper, we introduce Era3D, a novel multiview diffusion method that generates high-resolution multiview images from a single-view image. Despite significant advancements in multiview generation, existing methods still suffer from camera prior mismatch, inefficacy, and low resolution, resulting in poor-quality multiview images. Specifically, these methods assume that the input images should… ▽ More

    Submitted 29 May, 2024; v1 submitted 19 May, 2024; originally announced May 2024.

  23. arXiv:2405.11273  [pdf, other

    cs.AI cs.CL cs.CV cs.MM

    Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts

    Authors: Yunxin Li, Shenyuan Jiang, Baotian Hu, Longyue Wang, Wanqi Zhong, Wenhan Luo, Lin Ma, Min Zhang

    Abstract: Recent advancements in Multimodal Large Language Models (MLLMs) underscore the significance of scalable models and data to boost performance, yet this often incurs substantial computational costs. Although the Mixture of Experts (MoE) architecture has been employed to efficiently scale large language and image-text models, these efforts typically involve fewer experts and limited modalities. To ad… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

    Comments: 22 pages, 13 figures. Project Website: https://uni-moe.github.io/. Working in progress

  24. arXiv:2405.09786  [pdf, other

    cs.LG cs.CR

    IBD-PSC: Input-level Backdoor Detection via Parameter-oriented Scaling Consistency

    Authors: Linshan Hou, Ruili Feng, Zhongyun Hua, Wei Luo, Leo Yu Zhang, Yiming Li

    Abstract: Deep neural networks (DNNs) are vulnerable to backdoor attacks, where adversaries can maliciously trigger model misclassifications by implanting a hidden backdoor during model training. This paper proposes a simple yet effective input-level backdoor detection (dubbed IBD-PSC) as a `firewall' to filter out malicious testing images. Our method is motivated by an intriguing phenomenon, i.e., paramete… ▽ More

    Submitted 2 June, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

    Comments: Accepted to ICML 2024, 31 pages

  25. arXiv:2405.05695  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Aux-NAS: Exploiting Auxiliary Labels with Negligibly Extra Inference Cost

    Authors: Yuan Gao, Weizhong Zhang, Wenhan Luo, Lin Ma, **-Gang Yu, Gui-Song Xia, Jiayi Ma

    Abstract: We aim at exploiting additional auxiliary labels from an independent (auxiliary) task to boost the primary task performance which we focus on, while preserving a single task inference cost of the primary task. While most existing auxiliary learning methods are optimization-based relying on loss weights/gradients manipulation, our method is architecture-based with a flexible asymmetric structure fo… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: Accepted to ICLR 2024

    Journal ref: International Conference on Learning Representations (ICLR), 2024

  26. arXiv:2405.04795  [pdf, other

    cs.LG

    Variational Schrödinger Diffusion Models

    Authors: Wei Deng, Weijian Luo, Yixin Tan, Marin Biloš, Yu Chen, Yuriy Nevmyvaka, Ricky T. Q. Chen

    Abstract: Schrödinger bridge (SB) has emerged as the go-to method for optimizing transportation plans in diffusion models. However, SB requires estimating the intractable forward score functions, inevitably resulting in the costly implicit training loss based on simulated trajectories. To improve the scalability while preserving efficient transportation plans, we leverage variational inference to linearize… ▽ More

    Submitted 19 June, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

    Comments: ICML 2024

  27. arXiv:2404.15342  [pdf, other

    eess.SP cs.LG

    WaveSleepNet: An Interpretable Network for Expert-like Sleep Staging

    Authors: Yan Pei, Wei Luo

    Abstract: Although deep learning algorithms have proven their efficiency in automatic sleep staging, the widespread skepticism about their "black-box" nature has limited its clinical acceptance. In this study, we propose WaveSleepNet, an interpretable neural network for sleep staging that reasons in a similar way to sleep experts. In this network, we utilize the latent space representations generated during… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: 17 pages, 6 figures

  28. arXiv:2404.12107  [pdf, other

    cs.DB cs.DS cs.SI

    Effective Individual Fairest Community Search over Heterogeneous Information Networks

    Authors: Taige Zhao, Jianxin Li, Ningning Cui, Wei Luo

    Abstract: Community search over heterogeneous information networks has been applied to wide domains, such as activity organization and team formation. From these scenarios, the members of a group with the same treatment often have different levels of activity and workloads, which causes unfairness in the treatment between active members and inactive members (called individual unfairness). However, existing… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  29. arXiv:2404.07626  [pdf, other

    cs.CV

    Homography Guided Temporal Fusion for Road Line and Marking Segmentation

    Authors: Shan Wang, Chuong Nguyen, Jiawei Liu, Kaihao Zhang, Wenhan Luo, Yanhao Zhang, Sundaram Muthu, Fahira Afzal Maken, Hongdong Li

    Abstract: Reliable segmentation of road lines and markings is critical to autonomous driving. Our work is motivated by the observations that road lines and markings are (1) frequently occluded in the presence of moving vehicles, shadow, and glare and (2) highly structured with low intra-class shape variance and overall high appearance consistency. To solve these issues, we propose a Homography Guided Fusion… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: Accepted by ICCV 2023

  30. arXiv:2404.03027  [pdf, other

    cs.CR cs.AI cs.CL

    JailBreakV-28K: A Benchmark for Assessing the Robustness of MultiModal Large Language Models against Jailbreak Attacks

    Authors: Weidi Luo, Siyuan Ma, Xiaogeng Liu, Xiaoyu Guo, Chaowei Xiao

    Abstract: With the rapid advancements in Multimodal Large Language Models (MLLMs), securing these models against malicious inputs while aligning them with human values has emerged as a critical challenge. In this paper, we investigate an important and unexplored question of whether techniques that successfully jailbreak Large Language Models (LLMs) can be equally effective in jailbreaking MLLMs. To explore… ▽ More

    Submitted 18 April, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

  31. arXiv:2403.19975  [pdf, other

    cs.CV

    Context-Aware Integration of Language and Visual References for Natural Language Tracking

    Authors: Yanyan Shao, Shuting He, Qi Ye, Yuchao Feng, Wenhan Luo, Jiming Chen

    Abstract: Tracking by natural language specification (TNL) aims to consistently localize a target in a video sequence given a linguistic description in the initial frame. Existing methodologies perform language-based and template-based matching for target reasoning separately and merge the matching results from two sources, which suffer from tracking drift when language and visual templates miss-align with… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR2024

  32. arXiv:2403.17236  [pdf, other

    cs.LG

    Neural Image Compression with Quantization Rectifier

    Authors: Wei Luo, Bo Chen

    Abstract: Neural image compression has been shown to outperform traditional image codecs in terms of rate-distortion performance. However, quantization introduces errors in the compression process, which can degrade the quality of the compressed image. Existing approaches address the train-test mismatch problem incurred during quantization, the random impact of quantization on the expressiveness of image fe… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: Published at International Conference on Machine Learning (ICML) Neural Compression Workshop 2023, Honolulu, Hawaii, USA. PMLR 202, 2023. Copyright 2023 by the authors

  33. arXiv:2403.13512  [pdf, other

    cs.CV cs.AI

    Scale Decoupled Distillation

    Authors: Shicai Wei Chunbo Luo Yang Luo

    Abstract: Logit knowledge distillation attracts increasing attention due to its practicality in recent studies. However, it often suffers inferior performance compared to the feature knowledge distillation. In this paper, we argue that existing logit-based methods may be sub-optimal since they only leverage the global logit output that couples multiple semantic knowledge. This may transfer ambiguous knowled… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR2024 10 pages 6figure

  34. arXiv:2403.10983  [pdf, other

    cs.CV

    OMG: Occlusion-friendly Personalized Multi-concept Generation in Diffusion Models

    Authors: Zhe Kong, Yong Zhang, Tianyu Yang, Tao Wang, Kaihao Zhang, Bizhu Wu, Guanying Chen, Wei Liu, Wenhan Luo

    Abstract: Personalization is an important topic in text-to-image generation, especially the challenging multi-concept personalization. Current multi-concept methods are struggling with identity preservation, occlusion, and the harmony between foreground and background. In this work, we propose OMG, an occlusion-friendly personalized generation framework designed to seamlessly integrate multiple concepts wit… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

    Comments: Homepage: https://kongzhecn.github.io/omg-project/ Github: https://github.com/kongzhecn/OMG/

  35. arXiv:2403.06642  [pdf, other

    cs.IR cs.AI cs.CL

    TRAWL: External Knowledge-Enhanced Recommendation with LLM Assistance

    Authors: Weiqing Luo, Chonggang Song, Lingling Yi, Gong Cheng

    Abstract: Combining semantic information with behavioral data is a crucial research area in recommender systems. A promising approach involves leveraging external knowledge to enrich behavioral-based recommender systems with abundant semantic information. However, this approach faces two primary challenges: denoising raw external knowledge and adapting semantic representations. To address these challenges,… ▽ More

    Submitted 24 May, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: 8 pages, 3 figures

  36. arXiv:2403.06430  [pdf, other

    cs.CV

    AS-FIBA: Adaptive Selective Frequency-Injection for Backdoor Attack on Deep Face Restoration

    Authors: Zhenbo Song, Wenhao Gao, Kaihao Zhang, Wenhan Luo, Zhaoxin Fan, Jianfeng Lu

    Abstract: Deep learning-based face restoration models, increasingly prevalent in smart devices, have become targets for sophisticated backdoor attacks. These attacks, through subtle trigger injection into input face images, can lead to unexpected restoration outcomes. Unlike conventional methods focused on classification tasks, our approach introduces a unique degradation objective tailored for attacking re… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  37. arXiv:2403.05906  [pdf, other

    eess.IV cs.CV

    Segmentation Guided Sparse Transformer for Under-Display Camera Image Restoration

    Authors: **gyun Xue, Tao Wang, Jun Wang, Kaihao Zhang, Wenhan Luo, Wenqi Ren, Zikun Liu, Hyunhee Park, Xiaochun Cao

    Abstract: Under-Display Camera (UDC) is an emerging technology that achieves full-screen display via hiding the camera under the display panel. However, the current implementation of UDC causes serious degradation. The incident light required for camera imaging undergoes attenuation and diffraction when passing through the display panel, leading to various artifacts in UDC imaging. Presently, the prevailing… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

    Comments: 13 pages, 10 figures, conference or other essential info

  38. arXiv:2403.04194  [pdf, other

    cs.CV

    SAM-PD: How Far Can SAM Take Us in Tracking and Segmenting Anything in Videos by Prompt Denoising

    Authors: Tao Zhou, Wenhan Luo, Qi Ye, Zhiguo Shi, Jiming Chen

    Abstract: Recently, promptable segmentation models, such as the Segment Anything Model (SAM), have demonstrated robust zero-shot generalization capabilities on static images. These promptable models exhibit denoising abilities for imprecise prompt inputs, such as imprecise bounding boxes. In this paper, we explore the potential of applying SAM to track and segment objects in videos where we recognize the tr… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

  39. arXiv:2403.01364  [pdf, other

    cs.CL

    Improving Cross-lingual Representation for Semantic Retrieval with Code-switching

    Authors: Mieradilijiang Maimaiti, Yuanhang Zheng, Ji Zhang, Fei Huang, Yue Zhang, Wenpei Luo, Kaiyu Huang

    Abstract: Semantic Retrieval (SR) has become an indispensable part of the FAQ system in the task-oriented question-answering (QA) dialogue scenario. The demands for a cross-lingual smart-customer-service system for an e-commerce platform or some particular business conditions have been increasing recently. Most previous studies exploit cross-lingual pre-trained models (PTMs) for multi-lingual knowledge retr… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

  40. arXiv:2402.16438  [pdf, other

    cs.CL

    Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models

    Authors: Tianyi Tang, Wenyang Luo, Haoyang Huang, Dongdong Zhang, Xiaolei Wang, Xin Zhao, Furu Wei, Ji-Rong Wen

    Abstract: Large language models (LLMs) demonstrate remarkable multilingual capabilities without being pre-trained on specially curated multilingual parallel corpora. It remains a challenging problem to explain the underlying mechanisms by which LLMs process multilingual texts. In this paper, we delve into the composition of Transformer architectures in LLMs to pinpoint language-specific regions. Specially,… ▽ More

    Submitted 6 June, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: Accepted by ACL 2024

  41. arXiv:2402.16141  [pdf, other

    cs.CL

    PeriodicLoRA: Breaking the Low-Rank Bottleneck in LoRA Optimization

    Authors: Xiangdi Meng, Damai Dai, Weiyao Luo, Zhe Yang, Shaoxiang Wu, Xiaochen Wang, Peiyi Wang, Qingxiu Dong, Liang Chen, Zhifang Sui

    Abstract: Supervised fine-tuning is the most common method to adapt large language models (LLMs) to downstream tasks, but full fine-tuning LLMs requires massive computational resources. Recently, parameter-efficient fine-tuning (PEFT) methods have been widely studied due to its cost-effectiveness. LoRA is one of the most widely used methods, which assumes that the optimization process is essentially low-dim… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

  42. arXiv:2402.13629  [pdf, other

    eess.IV cs.CV

    Adversarial Purification and Fine-tuning for Robust UDC Image Restoration

    Authors: Zhenbo Song, Zhenyuan Zhang, Kaihao Zhang, Wenhan Luo, Zhaoxin Fan, Jianfeng Lu

    Abstract: This study delves into the enhancement of Under-Display Camera (UDC) image restoration models, focusing on their robustness against adversarial attacks. Despite its innovative approach to seamless display integration, UDC technology faces unique image degradation challenges exacerbated by the susceptibility to adversarial perturbations. Our research initially conducts an in-depth robustness evalua… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  43. arXiv:2402.13587  [pdf, other

    cs.CL cs.CV

    A Multimodal In-Context Tuning Approach for E-Commerce Product Description Generation

    Authors: Yunxin Li, Baotian Hu, Wenhan Luo, Lin Ma, Yuxin Ding, Min Zhang

    Abstract: In this paper, we propose a new setting for generating product descriptions from images, augmented by marketing keywords. It leverages the combined power of visual and textual information to create descriptions that are more tailored to the unique features of products. For this setting, previous methods utilize visual and textual encoders to encode the image and keywords and employ a language mode… ▽ More

    Submitted 7 March, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

    Comments: Accepted by LREC-COLING 2024

  44. arXiv:2402.13499  [pdf, other

    cs.AR

    Benchmarking and Dissecting the Nvidia Hopper GPU Architecture

    Authors: Weile Luo, Ruibo Fan, Zeyu Li, Dayou Du, Qiang Wang, Xiaowen Chu

    Abstract: Graphics processing units (GPUs) are continually evolving to cater to the computational demands of contemporary general-purpose workloads, particularly those driven by artificial intelligence (AI) utilizing deep learning techniques. A substantial body of studies have been dedicated to dissecting the microarchitectural metrics characterizing diverse GPU generations, which helps researchers understa… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  45. arXiv:2402.02374  [pdf, other

    cs.CV

    PromptRR: Diffusion Models as Prompt Generators for Single Image Reflection Removal

    Authors: Tao Wang, Wanglong Lu, Kaihao Zhang, Wenhan Luo, Tae-Kyun Kim, Tong Lu, Hongdong Li, Ming-Hsuan Yang

    Abstract: Existing single image reflection removal (SIRR) methods using deep learning tend to miss key low-frequency (LF) and high-frequency (HF) differences in images, affecting their effectiveness in removing reflections. To address this problem, this paper proposes a novel prompt-guided reflection removal (PromptRR) framework that uses frequency information as new visual prompts for better reflection per… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

    Comments: 10 pages, 10 figures

  46. arXiv:2402.02165  [pdf, other

    cs.LG

    Towards Optimal Adversarial Robust Q-learning with Bellman Infinity-error

    Authors: Haoran Li, Zicheng Zhang, Wang Luo, Congying Han, Yudong Hu, Tiande Guo, Shichen Liao

    Abstract: Establishing robust policies is essential to counter attacks or disturbances affecting deep reinforcement learning (DRL) agents. Recent studies explore state-adversarial robustness and suggest the potential lack of an optimal robust policy (ORP), posing challenges in setting strict robustness constraints. This work further investigates ORP: At first, we introduce a consistency assumption of policy… ▽ More

    Submitted 19 May, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

    Journal ref: ICML 2024 Oral

  47. arXiv:2402.02033  [pdf, ps, other

    cs.AI cs.NE

    Benchmark for CEC 2024 Competition on Multiparty Multiobjective Optimization

    Authors: Wenjian Luo, Peilan Xu, Shengxiang Yang, Yuhui Shi

    Abstract: The competition focuses on Multiparty Multiobjective Optimization Problems (MPMOPs), where multiple decision makers have conflicting objectives, as seen in applications like UAV path planning. Despite their importance, MPMOPs remain understudied in comparison to conventional multiobjective optimization. The competition aims to address this gap by encouraging researchers to explore tailored modelin… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

    Comments: 15 pages, 0 figures

  48. arXiv:2401.10727  [pdf, other

    cs.CV

    MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning

    Authors: Chenyu Wang, Weixin Luo, Qianyu Chen, Haonan Mai, **di Guo, Sixun Dong, Xiaohua, Xuan, Zhengxin Li, Lin Ma, Shenghua Gao

    Abstract: Recently, the astonishing performance of large language models (LLMs) in natural language comprehension and generation tasks triggered lots of exploration of using them as central controllers to build agent systems. Multiple studies focus on bridging the LLMs to external tools to extend the application scenarios. However, the current LLMs' perceiving tool-use ability is limited to a single text qu… ▽ More

    Submitted 23 January, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

    Comments: 21 pages, 9 figures, 10 tables

  49. arXiv:2401.08190  [pdf, other

    cs.CL

    MARIO: MAth Reasoning with code Interpreter Output -- A Reproducible Pipeline

    Authors: Minpeng Liao, Wei Luo, Chengxi Li, **g Wu, Kai Fan

    Abstract: Large language models (LLMs) have seen considerable advancements in natural language understanding tasks, yet there remains a gap to bridge before attaining true artificial general intelligence, especially concerning shortcomings in mathematical reasoning capabilities. We postulate that the inherent nature of LLM training, which focuses on predicting probabilities of next token, presents challenge… ▽ More

    Submitted 21 February, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

  50. arXiv:2401.05960  [pdf, other

    cs.AI

    Machine Learning Insides OptVerse AI Solver: Design Principles and Applications

    Authors: Xijun Li, Fangzhou Zhu, Hui-Ling Zhen, Weilin Luo, Meng Lu, Yimin Huang, Zhenan Fan, Zirui Zhou, Yufei Kuang, Zhihai Wang, Zijie Geng, Yang Li, Haoyang Liu, Zhiwu An, Muming Yang, Jianshu Li, Jie Wang, Junchi Yan, Defeng Sun, Tao Zhong, Yong Zhang, Jia Zeng, Mingxuan Yuan, Jianye Hao, Jun Yao , et al. (1 additional authors not shown)

    Abstract: In an era of digital ubiquity, efficient resource management and decision-making are paramount across numerous industries. To this end, we present a comprehensive study on the integration of machine learning (ML) techniques into Huawei Cloud's OptVerse AI Solver, which aims to mitigate the scarcity of real-world mathematical programming instances, and to surpass the capabilities of traditional opt… ▽ More

    Submitted 17 January, 2024; v1 submitted 11 January, 2024; originally announced January 2024.