Skip to main content

Showing 1–50 of 302 results for author: Zheng, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.16007  [pdf, other

    cs.CL

    Distributed Rule Vectors is A Key Mechanism in Large Language Models' In-Context Learning

    Authors: Bowen Zheng, Ming Ma, Zhongqiao Lin, Tianming Yang

    Abstract: Large Language Models (LLMs) have demonstrated remarkable abilities, one of the most important being In-Context Learning (ICL). With ICL, LLMs can derive the underlying rule from a few demonstrations and provide answers that comply with the rule. Previous work hypothesized that the network creates a "task vector" in specific positions during ICL. Patching the "task vector" allows LLMs to achieve z… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  2. arXiv:2406.15513  [pdf, other

    cs.AI cs.CL

    PKU-SafeRLHF: A Safety Alignment Preference Dataset for Llama Family Models

    Authors: Jiaming Ji, Donghai Hong, Borong Zhang, Boyuan Chen, Josef Dai, Boren Zheng, Tianyi Qiu, Boxun Li, Yaodong Yang

    Abstract: In this work, we introduce the PKU-SafeRLHF dataset, designed to promote research on safety alignment in large language models (LLMs). As a sibling project to SafeRLHF and BeaverTails, we separate annotations of helpfulness and harmlessness for question-answering pairs, providing distinct perspectives on these coupled attributes. Overall, we provide 44.6k refined prompts and 265k question-answer p… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: a sibling project to SafeRLHF and BeaverTails

  3. arXiv:2406.14550  [pdf, other

    cs.CL cs.AI

    GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models

    Authors: Shilong Li, Yancheng He, Hangyu Guo, Xingyuan Bu, Ge Bai, Jie Liu, Jiaheng Liu, Xingwei Qu, Yangguang Li, Wanli Ouyang, Wenbo Su, Bo Zheng

    Abstract: Long-context capabilities are essential for large language models (LLMs) to tackle complex and long-input tasks. Despite numerous efforts made to optimize LLMs for long contexts, challenges persist in robustly processing long inputs. In this paper, we introduce GraphReader, a graph-based agent system designed to handle long texts by structuring them into a graph and employing an agent to explore t… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: The first four authors contributed equally, 27 pages

  4. arXiv:2406.11503  [pdf, other

    cs.CV cs.CL

    GeoGPT4V: Towards Geometric Multi-modal Large Language Models with Geometric Image Generation

    Authors: Shihao Cai, Keqin Bao, Hangyu Guo, Jizhi Zhang, Jun Song, Bo Zheng

    Abstract: Large language models have seen widespread adoption in math problem-solving. However, in geometry problems that usually require visual aids for better understanding, even the most advanced multi-modal models currently still face challenges in effectively using image information. High-quality data is crucial for enhancing the geometric capabilities of multi-modal models, yet existing open-source da… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  5. arXiv:2406.01375  [pdf, other

    cs.CL

    D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models

    Authors: Haoran Que, Jiaheng Liu, Ge Zhang, Chenchen Zhang, Xingwei Qu, Yinghao Ma, Feiyu Duan, Zhiqi Bai, Jiakai Wang, Yuanxing Zhang, Xu Tan, Jie Fu, Wenbo Su, Jiamang Wang, Lin Qu, Bo Zheng

    Abstract: Continual Pre-Training (CPT) on Large Language Models (LLMs) has been widely used to expand the model's fundamental understanding of specific downstream domains (e.g., math and code). For the CPT on domain-specific LLMs, one important question is how to choose the optimal mixture ratio between the general-corpus (e.g., Dolma, Slim-pajama) and the downstream domain-corpus. Existing methods usually… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  6. arXiv:2406.01359  [pdf, other

    cs.CL cs.SE

    R2C2-Coder: Enhancing and Benchmarking Real-world Repository-level Code Completion Abilities of Code Large Language Models

    Authors: Ken Deng, Jiaheng Liu, He Zhu, Congnan Liu, **gxin Li, Jiakai Wang, Peng Zhao, Chenchen Zhang, Yanan Wu, Xueqiao Yin, Yuanxing Zhang, Wenbo Su, Bangyu Xiang, Tiezheng Ge, Bo Zheng

    Abstract: Code completion models have made significant progress in recent years. Recently, repository-level code completion has drawn more attention in modern software development, and several baseline methods and benchmarks have been proposed. However, existing repository-level code completion methods often fall short of fully using the extensive context of a project repository, such as the intricacies of… ▽ More

    Submitted 3 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

  7. arXiv:2406.00023  [pdf, other

    cs.CL

    LocMoE+: Enhanced Router with Token Feature Awareness for Efficient LLM Pre-Training

    Authors: **g Li, Zhijie Sun, Dachao Lin, Xuan He, Yi Lin, Binfan Zheng, Li Zeng, Rongqian Zhao, Xin Chen

    Abstract: Mixture-of-Experts (MoE) architectures have recently gained increasing popularity within the domain of large language models (LLMs) due to their ability to significantly reduce training and inference overhead. However, MoE architectures face challenges, such as significant disparities in the number of tokens assigned to each expert and a tendency toward homogenization among experts, which adversel… ▽ More

    Submitted 23 May, 2024; originally announced June 2024.

  8. arXiv:2405.20750  [pdf, other

    cs.CV

    Diffusion Models Are Innate One-Step Generators

    Authors: Bowen Zheng, Tianming Yang

    Abstract: Diffusion Models (DMs) have achieved great success in image generation and other fields. By fine sampling through the trajectory defined by the SDE/ODE solver based on a well-trained score model, DMs can generate remarkable high-quality results. However, this precise sampling often requires multiple steps and is computationally demanding. To address this problem, instance-based distillation method… ▽ More

    Submitted 6 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

    Comments: 9 pages, 4 figures and 4 tables on the main contents

  9. arXiv:2405.19730  [pdf

    cs.AI cs.CV cs.LG

    Research on Foundation Model for Spatial Data Intelligence: China's 2024 White Paper on Strategic Development of Spatial Data Intelligence

    Authors: Shaohua Wang, Xing Xie, Yong Li, Danhuai Guo, Zhi Cai, Yu Liu, Yang Yue, Xiao Pan, Feng Lu, Huayi Wu, Zhipeng Gui, Zhiming Ding, Bolong Zheng, Fuzheng Zhang, Tao Qin, **gyuan Wang, Chuang Tao, Zhengchao Chen, Hao Lu, Jiayi Li, Hongyang Chen, Peng Yue, Wenhao Yu, Yao Yao, Leilei Sun , et al. (9 additional authors not shown)

    Abstract: This report focuses on spatial data intelligent large models, delving into the principles, methods, and cutting-edge applications of these models. It provides an in-depth discussion on the definition, development history, current status, and trends of spatial data intelligent large models, as well as the challenges they face. The report systematically elucidates the key technologies of spatial dat… ▽ More

    Submitted 29 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: in Chinese language

  10. arXiv:2405.17047  [pdf, other

    cs.RO cs.LG

    Interpretable Robotic Manipulation from Language

    Authors: Boyuan Zheng, Jianlong Zhou, Fang Chen

    Abstract: Humans naturally employ linguistic instructions to convey knowledge, a process that proves significantly more complex for machines, especially within the context of multitask robotic manipulation environments. Natural language, moreover, serves as the primary medium through which humans acquire new knowledge, presenting a potentially intuitive bridge for translating concepts understandable by huma… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  11. arXiv:2405.16605  [pdf, other

    cs.CV

    Demystify Mamba in Vision: A Linear Attention Perspective

    Authors: Dongchen Han, Ziyi Wang, Zhuofan Xia, Yizeng Han, Yifan Pu, Chunjiang Ge, Jun Song, Shiji Song, Bo Zheng, Gao Huang

    Abstract: Mamba is an effective state space model with linear computation complexity. It has recently shown impressive efficiency in dealing with high-resolution inputs across various vision tasks. In this paper, we reveal that the powerful Mamba model shares surprising similarities with linear attention Transformer, which typically underperform conventional Transformer in practice. By exploring the similar… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  12. arXiv:2405.16141  [pdf, other

    cs.LG cs.AI cs.CE

    AIGB: Generative Auto-bidding via Diffusion Modeling

    Authors: Jiayan Guo, Yusen Huo, Zhilin Zhang, Tianyu Wang, Chuan Yu, Jian Xu, Yan Zhang, Bo Zheng

    Abstract: Auto-bidding plays a crucial role in facilitating online advertising by automatically providing bids for advertisers. Reinforcement learning (RL) has gained popularity for auto-bidding. However, most current RL auto-bidding methods are modeled through the Markovian Decision Process (MDP), which assumes the Markovian state transition. This assumption restricts the ability to perform in long horizon… ▽ More

    Submitted 27 June, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

    Comments: Accepted by KDD 2024

  13. arXiv:2405.15790  [pdf, ps, other

    cs.CC

    The Radical Solution and Computational Complexity

    Authors: Bo** Zheng, Weiwu Wang

    Abstract: The radical solution of polynomials with rational coefficients is a famous solved problem. This paper found that it is a $\mathbb{NP}$ problem. Furthermore, this paper found that arbitrary $ \mathscr{P} \in \mathbb{P}$ shall have a one-way running graph $G$, and have a corresponding $\mathscr{Q} \in \mathbb{NP}$ which have a two-way running graph $G'$, $G$ and $G'$ is isomorphic, i.e., $G'$ is com… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

  14. arXiv:2405.15738  [pdf, other

    cs.CV

    ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models

    Authors: Chunjiang Ge, Sijie Cheng, Ziming Wang, Jiale Yuan, Yuan Gao, Jun Song, Shiji Song, Gao Huang, Bo Zheng

    Abstract: High-resolution Large Multimodal Models (LMMs) encounter the challenges of excessive visual tokens and quadratic visual complexity. Current high-resolution LMMs address the quadratic complexity while still generating excessive visual tokens. However, the redundancy in visual tokens is the key problem as it leads to more substantial compute. To mitigate this issue, we propose ConvLLaVA, which emplo… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 17 pages

  15. arXiv:2405.15734  [pdf, other

    cs.CV

    LM4LV: A Frozen Large Language Model for Low-level Vision Tasks

    Authors: Boyang Zheng, **** Gu, Shijun Li, Chao Dong

    Abstract: The success of large language models (LLMs) has fostered a new research trend of multi-modality large language models (MLLMs), which changes the paradigm of various fields in computer vision. Though MLLMs have shown promising results in numerous high-level vision and vision-language tasks such as VQA and text-to-image, no works have demonstrated how low-level vision tasks can benefit from MLLMs. W… ▽ More

    Submitted 11 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

  16. arXiv:2405.14040  [pdf, other

    cs.MM

    Synchronized Video Storytelling: Generating Video Narrations with Structured Storyline

    Authors: Dingyi Yang, Chunru Zhan, Ziheng Wang, Biao Wang, Tiezheng Ge, Bo Zheng, Qin **

    Abstract: Video storytelling is engaging multimedia content that utilizes video and its accompanying narration to attract the audience, where a key challenge is creating narrations for recorded visual scenes. Previous studies on dense video captioning and video story generation have made some progress. However, in practical applications, we typically require synchronized narrations for ongoing visual scenes… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 15 pages, 13 figures

  17. arXiv:2405.13581  [pdf, other

    cs.CV cs.AI

    Safety Alignment for Vision Language Models

    Authors: Zhendong Liu, Yuanbi Nie, Yingshui Tan, Xiangyu Yue, Qiushi Cui, Chongjun Wang, Xiaoyong Zhu, Bo Zheng

    Abstract: Benefiting from the powerful capabilities of Large Language Models (LLMs), pre-trained visual encoder models connected to an LLMs can realize Vision Language Models (VLMs). However, existing research shows that the visual modality of VLMs is vulnerable, with attackers easily bypassing LLMs' safety alignment through visual modality features to launch attacks. To address this issue, we enhance the e… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 23 pages, 15 figures

  18. arXiv:2405.12001  [pdf, other

    cs.LG cs.AI

    Scrutinize What We Ignore: Reining Task Representation Shift In Context-Based Offline Meta Reinforcement Learning

    Authors: Hai Zhang, Boyuan Zheng, Anqi Guo, Tianying Ji, Pheng-Ann Heng, Junqiao Zhao, Lanqing Li

    Abstract: Offline meta reinforcement learning (OMRL) has emerged as a promising approach for interaction avoidance and strong generalization performance by leveraging pre-collected data and meta-learning techniques. Previous context-based approaches predominantly rely on the intuition that maximizing the mutual information between the task and the task representation ($I(Z;M)$) can lead to performance impro… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  19. arXiv:2405.06951  [pdf, ps, other

    cs.IT eess.SP

    Intelligent Reflecting Surface-Aided Radar Spoofing

    Authors: Haozhe Wang, Beixiong Zheng, Xiaodan Shao, Rui Zhang

    Abstract: Electronic countermeasure (ECM) technology plays a critical role in modern electronic warfare, which can interfere with enemy radar detection systems by noise or deceptive signals. However, the conventional active jamming strategy incurs additional hardware and power costs and has the potential threat of exposing the target itself. To tackle the above challenges, we propose a new intelligent refle… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

    Comments: 5 pages, 4 figures

  20. arXiv:2404.15946  [pdf

    cs.CV cs.AI eess.IV

    Mammo-CLIP: Leveraging Contrastive Language-Image Pre-training (CLIP) for Enhanced Breast Cancer Diagnosis with Multi-view Mammography

    Authors: Xuxin Chen, Yuheng Li, Mingzhe Hu, Ella Salari, Xiaoqian Chen, Richard L. J. Qiu, Bin Zheng, Xiaofeng Yang

    Abstract: Although fusion of information from multiple views of mammograms plays an important role to increase accuracy of breast cancer detection, develo** multi-view mammograms-based computer-aided diagnosis (CAD) schemes still faces challenges and no such CAD schemes have been used in clinical practice. To overcome the challenges, we investigate a new approach based on Contrastive Language-Image Pre-tr… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  21. arXiv:2404.14768  [pdf, other

    cs.CV

    Enhancing Prompt Following with Visual Control Through Training-Free Mask-Guided Diffusion

    Authors: Hongyu Chen, Yiqi Gao, Min Zhou, Peng Wang, Xubin Li, Tiezheng Ge, Bo Zheng

    Abstract: Recently, integrating visual controls into text-to-image~(T2I) models, such as ControlNet method, has received significant attention for finer control capabilities. While various training-free methods make efforts to enhance prompt following in T2I models, the issue with visual control is still rarely studied, especially in the scenario that visual controls are misaligned with text prompts. In thi… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  22. arXiv:2404.14542  [pdf, other

    cs.CV

    UVEB: A Large-scale Benchmark and Baseline Towards Real-World Underwater Video Enhancement

    Authors: Yaofeng Xie, Lingwei Kong, Kai Chen, Ziqiang Zheng, Xiao Yu, Zhibin Yu, Bing Zheng

    Abstract: Learning-based underwater image enhancement (UIE) methods have made great progress. However, the lack of large-scale and high-quality paired training samples has become the main bottleneck hindering the development of UIE. The inter-frame information in underwater videos can accelerate or optimize the UIE process. Thus, we constructed the first large-scale high-resolution underwater video enhancem… ▽ More

    Submitted 27 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: 10 pages,CVPR2024 accept

    ACM Class: I.4

  23. arXiv:2404.13984  [pdf, other

    cs.CV

    RHanDS: Refining Malformed Hands for Generated Images with Decoupled Structure and Style Guidance

    Authors: Chengrui Wang, Pengfei Liu, Min Zhou, Ming Zeng, Xubin Li, Tiezheng Ge, Bo zheng

    Abstract: Although diffusion models can generate high-quality human images, their applications are limited by the instability in generating hands with correct structures. Some previous works mitigate the problem by considering hand structure yet struggle to maintain style consistency between refined malformed hands and other image regions. In this paper, we aim to solve the problem of inconsistency regardin… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  24. arXiv:2404.13903  [pdf, other

    cs.CV

    Accelerating Image Generation with Sub-path Linear Approximation Model

    Authors: Chen Xu, Tianhui Song, Weixin Feng, Xubin Li, Tiezheng Ge, Bo Zheng, Limin Wang

    Abstract: Diffusion models have significantly advanced the state of the art in image, audio, and video generation tasks. However, their applications in practical scenarios are hindered by slow inference speed. Drawing inspiration from the approximation strategies utilized in consistency models, we propose the Sub-path Linear Approximation Model (SLAM), which accelerates diffusion models while maintaining hi… ▽ More

    Submitted 22 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

  25. arXiv:2404.13534  [pdf, other

    cs.CV

    Motion-aware Latent Diffusion Models for Video Frame Interpolation

    Authors: Zhilin Huang, Yijie Yu, Ling Yang, Chujun Qin, Bing Zheng, Xiawu Zheng, Zikun Zhou, Yaowei Wang, Wenming Yang

    Abstract: With the advancement of AIGC, video frame interpolation (VFI) has become a crucial component in existing video generation frameworks, attracting widespread research interest. For the VFI task, the motion estimation between neighboring frames plays a crucial role in avoiding motion ambiguity. However, existing VFI methods always struggle to accurately predict the motion information between consecut… ▽ More

    Submitted 4 June, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

    Comments: 17 pages, 4 figures. arXiv admin note: substantial text overlap with arXiv:2303.09508 by other authors

  26. arXiv:2404.06830  [pdf, ps, other

    cs.NI eess.SP

    EMF Exposure Mitigation via MAC Scheduling

    Authors: Silvio Mandelli, Lorenzo Maggi, Bill Zheng, Christophe Grangeat, Azra Zejnilagic

    Abstract: International standards bodies define Electromagnetic field (EMF) emission requirements that can be translated into control of the base station actual Effective Isotropic Radiated Power (EIRP), i.e., averaged over a sliding time window. In this work we show how to comply with such requirements by designing a water-filling power allocation method operating at the MAC scheduler level. Our method ens… ▽ More

    Submitted 19 April, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

    Comments: 5 pages, 3 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  27. arXiv:2404.05057  [pdf, other

    cs.LG cs.DB

    TimeCSL: Unsupervised Contrastive Learning of General Shapelets for Explorable Time Series Analysis

    Authors: Zhiyu Liang, Chen Liang, Zheng Liang, Hongzhi Wang, Bo Zheng

    Abstract: Unsupervised (a.k.a. Self-supervised) representation learning (URL) has emerged as a new paradigm for time series analysis, because it has the ability to learn generalizable time series representation beneficial for many downstream tasks without using labels that are usually difficult to obtain. Considering that existing approaches have limitations in the design of the representation encoder and t… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  28. GTS: GPU-based Tree Index for Fast Similarity Search

    Authors: Yifan Zhu, Ruiyao Ma, Baihua Zheng, Xiangyu Ke, Lu Chen, Yunjun Gao

    Abstract: Similarity search, the task of identifying objects most similar to a given query object under a specific metric, has gathered significant attention due to its practical applications. However, the absence of coordinate information to accelerate similarity search and the high computational cost of measuring object similarity hinder the efficiency of existing CPU-based methods. Additionally, these me… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: Accepted by SIGMOD 2024

    Journal ref: Proc. ACM Manag. Data, 2(3): 142:1-142:27

  29. arXiv:2403.19366  [pdf, other

    cs.CV

    Infrared Small Target Detection with Scale and Location Sensitivity

    Authors: Qiankun Liu, Rui Liu, Bolun Zheng, Hongkui Wang, Ying Fu

    Abstract: Recently, infrared small target detection (IRSTD) has been dominated by deep-learning-based methods. However, these methods mainly focus on the design of complex model structures to extract discriminative features, leaving the loss functions for IRSTD under-explored. For example, the widely used Intersection over Union (IoU) and Dice losses lack sensitivity to the scales and locations of targets,… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024

  30. arXiv:2403.17496  [pdf, other

    cs.CV cs.GR

    Dr.Hair: Reconstructing Scalp-Connected Hair Strands without Pre-training via Differentiable Rendering of Line Segments

    Authors: Yusuke Takimoto, Hikari Takehara, Hiroyuki Sato, Zihao Zhu, Bo Zheng

    Abstract: In the film and gaming industries, achieving a realistic hair appearance typically involves the use of strands originating from the scalp. However, reconstructing these strands from observed surface images of hair presents significant challenges. The difficulty in acquiring Ground Truth (GT) data has led state-of-the-art learning-based methods to rely on pre-training with manually prepared synthet… ▽ More

    Submitted 29 March, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

    Comments: CVPR 2024

  31. arXiv:2403.13574  [pdf, other

    cs.IR cs.AI

    A Large Language Model Enhanced Sequential Recommender for Joint Video and Comment Recommendation

    Authors: Bowen Zheng, Zihan Lin, Enze Liu, Chen Yang, Enyang Bai, Cheng Ling, Wayne Xin Zhao, Ji-Rong Wen

    Abstract: In online video platforms, reading or writing comments on interesting videos has become an essential part of the video watching experience. However, existing video recommender systems mainly model users' interaction behaviors with videos, lacking consideration of comments in user behavior modeling. In this paper, we propose a novel recommendation approach called LSVCR by leveraging user interactio… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  32. arXiv:2403.12352  [pdf, other

    eess.SP cs.IT

    A New Intelligent Reflecting Surface-Aided Electromagnetic Stealth Strategy

    Authors: Xue Xiong, Beixiong Zheng, A. Lee Swindlehurst, Jie Tang, Wen Wu

    Abstract: Electromagnetic wave absorbing material (EWAM) plays an essential role in manufacturing stealth aircraft, which can achieve the electromagnetic stealth (ES) by reducing the strength of the signal reflected back to the radar system. However, the stealth performance is limited by the coating thickness, incident wave angles, and working frequencies. To tackle these limitations, we propose a new intel… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: 5 pages, 4 figures

  33. arXiv:2403.11556  [pdf, other

    eess.IV cs.CV

    Hierarchical Frequency-based Upsampling and Refining for Compressed Video Quality Enhancement

    Authors: Qianyu Zhang, Bolun Zheng, Xinying Chen, Quan Chen, Zhunjie Zhu, Can** Wang, Zongpeng Li, Chengang Yan

    Abstract: Video compression artifacts arise due to the quantization operation in the frequency domain. The goal of video quality enhancement is to reduce compression artifacts and reconstruct a visually-pleasant result. In this work, we propose a hierarchical frequency-based upsampling and refining neural network (HFUR) for compressed video quality enhancement. HFUR consists of two modules: implicit frequen… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  34. arXiv:2403.08484  [pdf, other

    cs.CL

    Data-oriented Dynamic Fine-tuning Parameter Selection Strategy for FISH Mask based Efficient Fine-tuning

    Authors: Ming Dong, Kang Xue, Bolong Zheng, Tingting He

    Abstract: In view of the huge number of parameters of Large language models (LLMs) , tuning all parameters is very costly, and accordingly fine-tuning specific parameters is more sensible. Most of parameter efficient fine-tuning (PEFT) concentrate on parameter selection strategies, such as additive method, selective method and reparametrization-based method. However, there are few methods that consider the… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  35. arXiv:2403.06747  [pdf, other

    cs.IR

    MetaSplit: Meta-Split Network for Limited-Stock Product Recommendation

    Authors: Wenhao Wu, Jialiang Zhou, Ailong He, Shuguang Han, Jufeng Chen, Bo Zheng

    Abstract: Compared to business-to-consumer (B2C) e-commerce systems, consumer-to-consumer (C2C) e-commerce platforms usually encounter the limited-stock problem, that is, a product can only be sold one time in a C2C system. This poses several unique challenges for click-through rate (CTR) prediction. Due to limited user interactions for each product (i.e. item), the corresponding item embedding in the CTR m… ▽ More

    Submitted 27 March, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: Accepted at WWW 2024. This work has already been deployed on the Xianyu platform in Alibaba. The first two authors contributed equally

  36. arXiv:2403.04172  [pdf, other

    cs.CV

    SDPL: Shifting-Dense Partition Learning for UAV-View Geo-Localization

    Authors: Quan Chen, Tingyu Wang, Zihao Yang, Haoran Li, Rongfeng Lu, Yaoqi Sun, Bolun Zheng, Chenggang Yan

    Abstract: Cross-view geo-localization aims to match images of the same target from different platforms, e.g., drone and satellite. It is a challenging task due to the changing both appearance of targets and environmental content from different views. Existing methods mainly focus on digging more comprehensive information through feature maps segmentation, while inevitably destroy the image structure and are… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: 12 pages

  37. arXiv:2403.02827  [pdf, other

    cs.CV

    Tuning-Free Noise Rectification for High Fidelity Image-to-Video Generation

    Authors: Weijie Li, Litong Gong, Yiran Zhu, Fanda Fan, Biao Wang, Tiezheng Ge, Bo Zheng

    Abstract: Image-to-video (I2V) generation tasks always suffer from kee** high fidelity in the open domains. Traditional image animation techniques primarily focus on specific domains such as faces or human poses, making them difficult to generalize to open domains. Several recent I2V frameworks based on diffusion models can generate dynamic content for open domain images but fail to maintain fidelity. We… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  38. arXiv:2403.02607  [pdf

    cs.GT cs.AI

    MEBS: Multi-task End-to-end Bid Shading for Multi-slot Display Advertising

    Authors: Zhen Gong, Lvyin Niu, Yang Zhao, Miao Xu, Zhenzhe Zheng, Haoqi Zhang, Zhilin Zhang, Fan Wu, Rongquan Bai, Chuan Yu, Jian Xu, Bo Zheng

    Abstract: Online bidding and auction are crucial aspects of the online advertising industry. Conventionally, there is only one slot for ad display and most current studies focus on it. Nowadays, multi-slot display advertising is gradually becoming popular where many ads could be displayed in a list and shown as a whole to users. However, multi-slot display advertising leads to different cost-effectiveness.… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  39. arXiv:2403.01841  [pdf, other

    cs.CL cs.LG

    Making Pre-trained Language Models Great on Tabular Prediction

    Authors: Jiahuan Yan, Bo Zheng, Hongxia Xu, Yiheng Zhu, Danny Z. Chen, Jimeng Sun, Jian Wu, **tai Chen

    Abstract: The transferability of deep neural networks (DNNs) has made significant progress in image and language processing. However, due to the heterogeneity among tables, such DNN bonus is still far from being well exploited on tabular data prediction (e.g., regression or classification tasks). Condensing knowledge from diverse domains, language models (LMs) possess the capability to comprehend feature na… ▽ More

    Submitted 12 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: Accepted to ICLR 2024 as spotlight presentation (Notable Top 5%). OpenReview link is https://openreview.net/forum?id=anzIzGZuLi, codes will be available at https://github.com/jyansir/tp-berta

  40. arXiv:2403.01800  [pdf, other

    cs.CV

    AtomoVideo: High Fidelity Image-to-Video Generation

    Authors: Litong Gong, Yiran Zhu, Weijie Li, Xiaoyang Kang, Biao Wang, Tiezheng Ge, Bo Zheng

    Abstract: Recently, video generation has achieved significant rapid development based on superior text-to-image generation techniques. In this work, we propose a high fidelity framework for image-to-video generation, named AtomoVideo. Based on multi-granularity image injection, we achieve higher fidelity of the generated video to the given image. In addition, thanks to high quality datasets and training str… ▽ More

    Submitted 5 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: Technical report. Page: https://atomo-video.github.io/

  41. arXiv:2403.01570  [pdf, other

    cs.CL cs.LG

    SERVAL: Synergy Learning between Vertical Models and LLMs towards Oracle-Level Zero-shot Medical Prediction

    Authors: Jiahuan Yan, **tai Chen, Chaowen Hu, Bo Zheng, Yaojun Hu, Jimeng Sun, Jian Wu

    Abstract: Recent development of large language models (LLMs) has exhibited impressive zero-shot proficiency on generic and common sense questions. However, LLMs' application on domain-specific vertical questions still lags behind, primarily due to the humiliation problems and deficiencies in vertical knowledge. Furthermore, the vertical data annotation process often requires labor-intensive expert involveme… ▽ More

    Submitted 16 March, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

  42. arXiv:2403.00860  [pdf, other

    cs.LG cs.AI cs.NE

    Parallel Algorithms for Exact Enumeration of Deep Neural Network Activation Regions

    Authors: Sabrina Drammis, Bowen Zheng, Karthik Srinivasan, Robert C. Berwick, Nancy A. Lynch, Robert Ajemian

    Abstract: A feedforward neural network using rectified linear units constructs a map** from inputs to outputs by partitioning its input space into a set of convex regions where points within a region share a single affine transformation. In order to understand how neural networks work, when and why they fail, and how they compare to biological intelligence, we need to understand the organization and forma… ▽ More

    Submitted 29 February, 2024; originally announced March 2024.

  43. arXiv:2403.00331  [pdf, other

    cs.DC

    WindGP: Efficient Graph Partitioning on Heterogenous Machines

    Authors: Li Zeng, Haohan Huang, Binfan Zheng, Kang Yang, Shengcheng Shao, **hua Zhou, Jun Xie, Rongqian Zhao, Xin Chen

    Abstract: Graph Partitioning is widely used in many real-world applications such as fraud detection and social network analysis, in order to enable the distributed graph computing on large graphs. However, existing works fail to balance the computation cost and communication cost on machines with different power (including computing capability, network bandwidth and memory size), as they only consider repli… ▽ More

    Submitted 6 March, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

    Comments: 19 pages, 15 figures, 18 tables

  44. arXiv:2402.14762  [pdf, other

    cs.CL cs.AI

    MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues

    Authors: Ge Bai, Jie Liu, Xingyuan Bu, Yancheng He, Jiaheng Liu, Zhanhui Zhou, Zhuoran Lin, Wenbo Su, Tiezheng Ge, Bo Zheng, Wanli Ouyang

    Abstract: The advent of Large Language Models (LLMs) has drastically enhanced dialogue systems. However, comprehensively evaluating the dialogue abilities of LLMs remains a challenge. Previous benchmarks have primarily focused on single-turn dialogues or provided coarse-grained and incomplete assessments of multi-turn dialogues, overlooking the complexity and fine-grained nuances of real-life dialogues. To… ▽ More

    Submitted 25 June, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: [ACL 2024] The first three authors contribute equally, 34 pages, repo at https://github.com/mtbench101/mt-bench-101

  45. arXiv:2402.14660  [pdf, other

    cs.CL cs.AI

    ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large Language Models

    Authors: Yanan Wu, Jie Liu, Xingyuan Bu, Jiaheng Liu, Zhanhui Zhou, Yuanxing Zhang, Chenchen Zhang, Zhiqi Bai, Haibin Chen, Tiezheng Ge, Wanli Ouyang, Wenbo Su, Bo Zheng

    Abstract: This paper introduces ConceptMath, a bilingual (English and Chinese), fine-grained benchmark that evaluates concept-wise mathematical reasoning of Large Language Models (LLMs). Unlike traditional benchmarks that evaluate general mathematical reasoning with an average accuracy, ConceptMath systematically organizes math problems under a hierarchy of math concepts, so that mathematical reasoning can… ▽ More

    Submitted 23 February, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: The benchmark dataset will be released soon

  46. arXiv:2402.14399  [pdf, other

    cs.IR cs.AI

    Ensure Timeliness and Accuracy: A Novel Sliding Window Data Stream Paradigm for Live Streaming Recommendation

    Authors: Fengqi Liang, Baigong Zheng, Liqin Zhao, Guorui Zhou, Qian Wang, Yanan Niu

    Abstract: Live streaming recommender system is specifically designed to recommend real-time live streaming of interest to users. Due to the dynamic changes of live content, improving the timeliness of the live streaming recommender system is a critical problem. Intuitively, the timeliness of the data determines the upper bound of the timeliness that models can learn. However, none of the previous works addr… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  47. arXiv:2402.13435  [pdf, other

    cs.IR cs.LG

    Learning to Retrieve for Job Matching

    Authors: Jianqiang Shen, Yuchin Juan, Shaobo Zhang, ** Liu, Wen Pu, Sriram Vasudevan, Qingquan Song, Fedor Borisyuk, Kay Qianqi Shen, Haichao Wei, Yunxiang Ren, Yeou S. Chiou, Sicong Kuang, Yuan Yin, Ben Zheng, Muchen Wu, Shaghayegh Gharghabi, Xiaoqing Wang, Huichao Xue, Qi Guo, Daniel Hewlett, Luke Simon, Liangjie Hong, Wen**g Zhang

    Abstract: Web-scale search systems typically tackle the scalability challenge with a two-step paradigm: retrieval and ranking. The retrieval step, also known as candidate selection, often involves extracting standardized entities, creating an inverted index, and performing term matching for retrieval. Such traditional methods require manual and time-consuming development of query models. In this paper, we d… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  48. arXiv:2402.11904  [pdf, other

    cs.GT cs.LG

    Scalable Virtual Valuations Combinatorial Auction Design by Combining Zeroth-Order and First-Order Optimization Method

    Authors: Zhijian Duan, Haoran Sun, Yichong Xia, Siqiang Wang, Zhilin Zhang, Chuan Yu, Jian Xu, Bo Zheng, Xiaotie Deng

    Abstract: Automated auction design seeks to discover empirically high-revenue and incentive-compatible mechanisms using machine learning. Ensuring dominant strategy incentive compatibility (DSIC) is crucial, and the most effective approach is to confine the mechanism to Affine Maximizer Auctions (AMAs). Nevertheless, existing AMA-based approaches encounter challenges such as scalability issues (arising from… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

  49. arXiv:2402.10196  [pdf, other

    cs.CL cs.AI

    A Trembling House of Cards? Map** Adversarial Attacks against Language Agents

    Authors: Lingbo Mo, Zeyi Liao, Boyuan Zheng, Yu Su, Chaowei Xiao, Huan Sun

    Abstract: Language agents powered by large language models (LLMs) have seen exploding development. Their capability of using language as a vehicle for thought and communication lends an incredible level of flexibility and versatility. People have quickly capitalized on this capability to connect LLMs to a wide range of external components and environments: databases, tools, the Internet, robotic embodiment,… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

  50. arXiv:2402.04476  [pdf, other

    cs.CV cs.AI cs.CL

    Dual-View Visual Contextualization for Web Navigation

    Authors: Jihyung Kil, Chan Hee Song, Boyuan Zheng, Xiang Deng, Yu Su, Wei-Lun Chao

    Abstract: Automatic web navigation aims to build a web agent that can follow language instructions to execute complex and diverse tasks on real-world websites. Existing work primarily takes HTML documents as input, which define the contents and action spaces (i.e., actionable elements and operations) of webpages. Nevertheless, HTML documents may not provide a clear task-related context for each element, mak… ▽ More

    Submitted 30 March, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

    Comments: Accepted to CVPR 2024