Skip to main content

Showing 1–50 of 2,007 results for author: Yang, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01235  [pdf, other

    cs.CR

    A Fingerprint for Large Language Models

    Authors: Zhiguang Yang, Hanzhou Wu

    Abstract: Recent advances show that scaling a pre-trained language model could achieve state-of-the-art performance on many downstream tasks, prompting large language models (LLMs) to become a hot research topic in the field of artificial intelligence. However, due to the resource-intensive nature of training LLMs from scratch, it is urgent and crucial to protect the intellectual property of LLMs against in… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: https://scholar.google.com/citations?user=IdiF7M0AAAAJ&hl=en

  2. arXiv:2407.00664  [pdf, other

    cs.CV cs.AI

    SCMIL: Sparse Context-aware Multiple Instance Learning for Predicting Cancer Survival Probability Distribution in Whole Slide Images

    Authors: Zekang Yang, Hong Liu, Xiangdong Wang

    Abstract: Cancer survival prediction is a challenging task that involves analyzing of the tumor microenvironment within Whole Slide Image (WSI). Previous methods cannot effectively capture the intricate interaction features among instances within the local area of WSI. Moreover, existing methods for cancer survival prediction based on WSI often fail to provide better clinically meaningful predictions. To ov… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: MICCAI2024

  3. arXiv:2406.19672  [pdf, other

    cs.CV

    Beyond First-Order: A Multi-Scale Approach to Finger Knuckle Print Biometrics

    Authors: Chengrui Gao, Ziyuan Yang, Andrew Beng ** Teoh, Min Zhu

    Abstract: Recently, finger knuckle prints (FKPs) have gained attention due to their rich textural patterns, positioning them as a promising biometric for identity recognition. Prior FKP recognition methods predominantly leverage first-order feature descriptors, which capture intricate texture details but fail to account for structural information. Emerging research, however, indicates that second-order text… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  4. arXiv:2406.19665  [pdf, other

    cs.CV

    PM-VIS+: High-Performance Video Instance Segmentation without Video Annotation

    Authors: Zhang**g Yang, Dun Liu, Xin Wang, Zhe Li, Barathwaj Anandan, Yi Wu

    Abstract: Video instance segmentation requires detecting, segmenting, and tracking objects in videos, typically relying on costly video annotations. This paper introduces a method that eliminates video annotations by utilizing image datasets. The PM-VIS algorithm is adapted to handle both bounding box and instance-level pixel annotations dynamically. We introduce ImageNet-bbox to supplement missing categori… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: MIPR 2024

  5. arXiv:2406.19240  [pdf, other

    cs.SE

    Data Preparation for Deep Learning based Code Smell Detection: A Systematic Literature Review

    Authors: Fengji Zhang, Zexian Zhang, Jacky Wai Keung, Xiangru Tang, Zhen Yang, Xiao Yu, Wenhua Hu

    Abstract: Code Smell Detection (CSD) plays a crucial role in improving software quality and maintainability. And Deep Learning (DL) techniques have emerged as a promising approach for CSD due to their superior performance. However, the effectiveness of DL-based CSD methods heavily relies on the quality of the training data. Despite its importance, little attention has been paid to analyzing the data prepara… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  6. arXiv:2406.19195  [pdf, other

    cs.LG cs.AI

    Estimating Long-term Heterogeneous Dose-response Curve: Generalization Bound Leveraging Optimal Transport Weights

    Authors: Zeqin Yang, Weilin Chen, Ruichu Cai, Yuguang Yan, Zhifeng Hao, Zhipeng Yu, Zhichao Zou, Zhen Peng, Jiecheng Guo

    Abstract: Long-term causal effect estimation is a significant but challenging problem in many applications. Existing methods rely on ideal assumptions to estimate long-term average effects, e.g., no unobserved confounders or a binary treatment,while in numerous real-world applications, these assumptions could be violated and average effects are unable to provide individual-level suggestions.In this paper,we… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  7. arXiv:2406.18859  [pdf, other

    cs.CL cs.AI

    Two-Pronged Human Evaluation of ChatGPT Self-Correction in Radiology Report Simplification

    Authors: Ziyu Yang, Santhosh Cherian, Slobodan Vucetic

    Abstract: Radiology reports are highly technical documents aimed primarily at doctor-doctor communication. There has been an increasing interest in sharing those reports with patients, necessitating providing them patient-friendly simplifications of the original reports. This study explores the suitability of large language models in automatically generating those simplifications. We examine the usefulness… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  8. arXiv:2406.18565  [pdf, other

    cs.CV cs.AI

    Pseudo-label Based Domain Adaptation for Zero-Shot Text Steganalysis

    Authors: Yufei Luo, Zhen Yang, Ru Zhang, Jianyi Liu

    Abstract: Currently, most methods for text steganalysis are based on deep neural networks (DNNs). However, in real-life scenarios, obtaining a sufficient amount of labeled stego-text for correctly training networks using a large number of parameters is often challenging and costly. Additionally, due to a phenomenon known as dataset bias or domain shift, recognition models trained on a large dataset exhibit… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: The 30th International Conference on Computational & Experimental Engineering and Sciences (ICCES2024)

  9. arXiv:2406.16526  [pdf, other

    cs.SE cs.AI

    NARRepair: Non-Autoregressive Code Generation Model for Automatic Program Repair

    Authors: Zhenyu Yang, Zhen Yang, Zhongxing Yu

    Abstract: With the advancement of deep learning techniques, the performance of Automatic Program Repair(APR) techniques has reached a new level. Previous deep learning-based APR techniques essentially modified program sentences in the Autoregressive(AR) manner, which predicts future values based on past values. Due to the manner of word-by-word generation, the AR-based APR technique has a huge time delay. T… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  10. arXiv:2406.16330  [pdf, other

    cs.CL cs.AI

    Pruning via Merging: Compressing LLMs via Manifold Alignment Based Layer Merging

    Authors: Deyuan Liu, Zhanyue Qin, Hairu Wang, Zhao Yang, Zecheng Wang, Fangying Rong, Qingbin Liu, Yanchao Hao, Xi Chen, Cunhang Fan, Zhao Lv, Zhiying Tu, Dianhui Chu, Bo Li, Dianbo Sui

    Abstract: While large language models (LLMs) excel in many domains, their complexity and scale challenge deployment in resource-limited environments. Current compression techniques, such as parameter pruning, often fail to effectively utilize the knowledge from pruned parameters. To address these challenges, we propose Manifold-Based Knowledge Alignment and Layer Merging Compression (MKA), a novel approach… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  11. arXiv:2406.16252  [pdf, other

    cs.LG cs.AI

    Graph-Augmented LLMs for Personalized Health Insights: A Case Study in Sleep Analysis

    Authors: Ajan Subramanian, Zhongqi Yang, Iman Azimi, Amir M. Rahmani

    Abstract: Health monitoring systems have revolutionized modern healthcare by enabling the continuous capture of physiological and behavioral data, essential for preventive measures and early health intervention. While integrating this data with Large Language Models (LLMs) has shown promise in delivering interactive health advice, traditional methods like Retrieval-Augmented Generation (RAG) and fine-tuning… ▽ More

    Submitted 24 June, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

  12. arXiv:2406.16213  [pdf, other

    cs.LG

    Provable Statistical Rates for Consistency Diffusion Models

    Authors: Zehao Dou, Minshuo Chen, Mengdi Wang, Zhuoran Yang

    Abstract: Diffusion models have revolutionized various application domains, including computer vision and audio generation. Despite the state-of-the-art performance, diffusion models are known for their slow sample generation due to the extensive number of steps involved. In response, consistency models have been developed to merge multiple steps in the sampling process, thereby significantly boosting the s… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: 28 pages, 2 figures

  13. arXiv:2406.15673  [pdf, other

    cs.CL cs.AI

    Large Language Models have Intrinsic Self-Correction Ability

    Authors: Dancheng Liu, Amir Nassereldine, Ziming Yang, Chenhui Xu, Yuting Hu, Jiajie Li, Utkarsh Kumar, Changjae Lee, **jun Xiong

    Abstract: Large language models (LLMs) have attracted significant attention for their remarkable abilities in various natural language processing tasks, but they suffer from hallucinations that will cause performance degradation. One promising solution to improve the LLMs' performance is to ask LLMs to revise their answer after generation, a technique known as self-correction. Among the two types of self-co… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: in submission

  14. arXiv:2406.15349  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking

    Authors: Daniel Dauner, Marcel Hallgarten, Tianyu Li, Xinshuo Weng, Zhiyu Huang, Zetong Yang, Hongyang Li, Igor Gilitschenski, Boris Ivanovic, Marco Pavone, Andreas Geiger, Kashyap Chitta

    Abstract: Benchmarking vision-based driving policies is challenging. On one hand, open-loop evaluation with real data is easy, but these results do not reflect closed-loop performance. On the other, closed-loop evaluation is possible in simulation, but is hard to scale due to its significant computational demands. Further, the simulators available today exhibit a large domain gap to real data. This has resu… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  15. arXiv:2406.15132  [pdf, other

    cs.LG cs.AI

    Younger: The First Dataset for Artificial Intelligence-Generated Neural Network Architecture

    Authors: Zhengxin Yang, Wanling Gao, Luzhou Peng, Yunyou Huang, Fei Tang, Jianfeng Zhan

    Abstract: Designing and optimizing neural network architectures typically requires extensive expertise, starting with handcrafted designs and then manual or automated refinement. This dependency presents a significant barrier to rapid innovation. Recognizing the complexity of automatically generating neural network architecture from scratch, we introduce Younger, a pioneering dataset to advance this ambitio… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 31 pages, 29 figures, 11 tables

  16. arXiv:2406.14939  [pdf, other

    cs.IT eess.SP

    RIS-aided MIMO Beamforming: Piece-Wise Near-field Channel Model

    Authors: Weijian Chen, Zai Yang, Zhiqiang Wei, Derrick Wing Kwan Ng, Michail Matthaiou

    Abstract: This paper proposes a joint active and passive beamforming design for reconfigurable intelligent surface (RIS)-aided wireless communication systems, adopting a piece-wise near-field channel model. While a traditional near-field channel model, applied without any approximations, offers higher modeling accuracy than a far-field model, it renders the system design more sensitive to channel estimation… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 28pages

  17. arXiv:2406.14877  [pdf, other

    cs.CL

    Sports Intelligence: Assessing the Sports Understanding Capabilities of Language Models through Question Answering from Text to Video

    Authors: Zhengbang Yang, Haotian Xia, **gxi Li, Zezhi Chen, Zhuangdi Zhu, Weining Shen

    Abstract: Understanding sports is crucial for the advancement of Natural Language Processing (NLP) due to its intricate and dynamic nature. Reasoning over complex sports scenarios has posed significant challenges to current NLP technologies which require advanced cognitive capabilities. Toward addressing the limitations of existing benchmarks on sports understanding in the NLP field, we extensively evaluate… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  18. arXiv:2406.14477  [pdf, other

    cs.CV cs.AI cs.DB

    SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset

    Authors: Josef Dai, Tianle Chen, Xuyao Wang, Ziran Yang, Taiye Chen, Jiaming Ji, Yaodong Yang

    Abstract: To mitigate the risk of harmful outputs from large vision models (LVMs), we introduce the SafeSora dataset to promote research on aligning text-to-video generation with human values. This dataset encompasses human preferences in text-to-video generation tasks along two primary dimensions: helpfulness and harmlessness. To capture in-depth human preferences and facilitate structured reasoning by cro… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  19. arXiv:2406.14095  [pdf, other

    cs.LG cs.AI

    Memory-Efficient Gradient Unrolling for Large-Scale Bi-level Optimization

    Authors: Qianli Shen, Yezhen Wang, Zhouhao Yang, Xiang Li, Haonan Wang, Yang Zhang, Jonathan Scarlett, Zhanxing Zhu, Kenji Kawaguchi

    Abstract: Bi-level optimization (BO) has become a fundamental mathematical framework for addressing hierarchical machine learning problems. As deep learning models continue to grow in size, the demand for scalable bi-level optimization solutions has become increasingly critical. Traditional gradient-based bi-level optimization algorithms, due to their inherent characteristics, are ill-suited to meet the dem… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  20. arXiv:2406.13331  [pdf, other

    cs.CL

    Improving Zero-shot LLM Re-Ranker with Risk Minimization

    Authors: Xiaowei Yuan, Zhao Yang, Yequan Wang, Jun Zhao, Kang Liu

    Abstract: In the Retrieval-Augmented Generation (RAG) system, advanced Large Language Models (LLMs) have emerged as effective Query Likelihood Models (QLMs) in an unsupervised way, which re-rank documents based on the probability of generating the query given the content of a document. However, directly prompting LLMs to approximate QLMs inherently is biased, where the estimated distribution might diverge f… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: Under review

  21. arXiv:2406.12809  [pdf, other

    cs.CL

    Can Large Language Models Always Solve Easy Problems if They Can Solve Harder Ones?

    Authors: Zhe Yang, Yichang Zhang, Tianyu Liu, Jian Yang, Junyang Lin, Chang Zhou, Zhifang Sui

    Abstract: Large language models (LLMs) have demonstrated impressive capabilities, but still suffer from inconsistency issues (e.g. LLMs can react differently to disturbances like rephrasing or inconsequential order change). In addition to these inconsistencies, we also observe that LLMs, while capable of solving hard problems, can paradoxically fail at easier ones. To evaluate this hard-to-easy inconsistenc… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 25 pages, 12 figures, 10 tables

  22. arXiv:2406.12793  [pdf, other

    cs.CL

    ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools

    Authors: Team GLM, :, Aohan Zeng, Bin Xu, Bowen Wang, Chenhui Zhang, Da Yin, Diego Rojas, Guanyu Feng, Hanlin Zhao, Hanyu Lai, Hao Yu, Hongning Wang, Jiadai Sun, Jiajie Zhang, Jiale Cheng, Jiayi Gui, Jie Tang, **g Zhang, Juanzi Li, Lei Zhao, Lindong Wu, Lucen Zhong, Mingdao Liu, Minlie Huang , et al. (32 additional authors not shown)

    Abstract: We introduce ChatGLM, an evolving family of large language models that we have been develo** over time. This report primarily focuses on the GLM-4 language series, which includes GLM-4, GLM-4-Air, and GLM-4-9B. They represent our most capable models that are trained with all the insights and lessons gained from the preceding three generations of ChatGLM. To date, the GLM-4 models are pre-trained… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  23. arXiv:2406.12252  [pdf, other

    cs.CL

    Language and Multimodal Models in Sports: A Survey of Datasets and Applications

    Authors: Haotian Xia, Zhengbang Yang, Yun Zhao, Yuqing Wang, **gxi Li, Rhys Tracy, Zhuangdi Zhu, Yuan-fang Wang, Hanjie Chen, Weining Shen

    Abstract: Recent integration of Natural Language Processing (NLP) and multimodal models has advanced the field of sports analytics. This survey presents a comprehensive review of the datasets and applications driving these innovations post-2020. We overviewed and categorized datasets into three primary types: language-based, multimodal, and convertible datasets. Language-based and multimodal datasets are fo… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  24. arXiv:2406.12195  [pdf, other

    quant-ph cs.LG

    Quantum Compiling with Reinforcement Learning on a Superconducting Processor

    Authors: Z. T. Wang, Qiuhao Chen, Yuxuan Du, Z. H. Yang, Xiaoxia Cai, Kaixuan Huang, **gning Zhang, Kai Xu, Jun Du, Yinan Li, Yuling Jiao, Xingyao Wu, Wu Liu, Xiliang Lu, Huikai Xu, Yirong **, Ruixia Wang, Haifeng Yu, S. P. Zhao

    Abstract: To effectively implement quantum algorithms on noisy intermediate-scale quantum (NISQ) processors is a central task in modern quantum technology. NISQ processors feature tens to a few hundreds of noisy qubits with limited coherence times and gate operations with errors, so NISQ algorithms naturally require employing circuits of short lengths via quantum compilation. Here, we develop a reinforcemen… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  25. arXiv:2406.11208  [pdf

    cs.NI

    Privacy-preserving Pseudonym Schemes for Personalized 3D Avatars in Mobile Social Metaverses

    Authors: Cheng Su, Xiaofeng Luo, Zhenmou Liu, Jiawen Kang, Min Hao, Zehui Xiong, Zhaohui Yang, Chongwen Huang

    Abstract: The emergence of mobile social metaverses, a novel paradigm bridging physical and virtual realms, has led to the widespread adoption of avatars as digital representations for Social Metaverse Users (SMUs) within virtual spaces. Equipped with immersive devices, SMUs leverage Edge Servers (ESs) to deploy their avatars and engage with other SMUs in virtual spaces. To enhance immersion, SMUs incline t… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 6pages, 4 figures

  26. arXiv:2406.10842  [pdf, other

    cs.CL cs.AI cs.HC

    Large Language Models for Automatic Milestone Detection in Group Discussions

    Authors: Zhuoxu Duan, Zhengye Yang, Samuel Westby, Christoph Riedl, Brooke Foucault Welles, Richard J. Radke

    Abstract: Large language models like GPT have proven widely successful on natural language understanding tasks based on written text documents. In this paper, we investigate an LLM's performance on recordings of a group oral communication task in which utterances are often truncated or not well-formed. We propose a new group task experiment involving a puzzle with several milestones that can be achieved in… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  27. arXiv:2406.10802  [pdf, other

    cs.CL cs.AI

    KGPA: Robustness Evaluation for Large Language Models via Cross-Domain Knowledge Graphs

    Authors: Aihua Pei, Zehua Yang, Shunan Zhu, Ruoxi Cheng, Ju Jia, Lina Wang

    Abstract: Existing frameworks for assessing robustness of large language models (LLMs) overly depend on specific benchmarks, increasing costs and failing to evaluate performance of LLMs in professional domains due to dataset limitations. This paper proposes a framework that systematically evaluates the robustness of LLMs under adversarial attack scenarios by leveraging knowledge graphs (KGs). Our framework… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  28. arXiv:2406.10667  [pdf, other

    cs.LG

    UniZero: Generalized and Efficient Planning with Scalable Latent World Models

    Authors: Yuan Pu, Yazhe Niu, Jiyuan Ren, Zhenjie Yang, Hongsheng Li, Yu Liu

    Abstract: Learning predictive world models is essential for enhancing the planning capabilities of reinforcement learning agents. Notably, the MuZero-style algorithms, based on the value equivalence principle and Monte Carlo Tree Search (MCTS), have achieved superhuman performance in various domains. However, in environments that require capturing long-term dependencies, MuZero's performance deteriorates ra… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 32 pages, 16 figures

  29. arXiv:2406.10227  [pdf, other

    cs.CV cs.AI

    VideoGUI: A Benchmark for GUI Automation from Instructional Videos

    Authors: Kevin Qinghong Lin, Linjie Li, Difei Gao, Qinchen WU, Mingyi Yan, Zhengyuan Yang, Lijuan Wang, Mike Zheng Shou

    Abstract: Graphical User Interface (GUI) automation holds significant promise for enhancing human productivity by assisting with computer tasks. Existing task formulations primarily focus on simple tasks that can be specified by a single, language-only instruction, such as "Insert a new slide." In this work, we introduce VideoGUI, a novel multi-modal benchmark designed to evaluate GUI assistants on visual-c… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 24 pages, 16 tables, 17 figures

  30. arXiv:2406.10125  [pdf, other

    cs.CV

    MapVision: CVPR 2024 Autonomous Grand Challenge Mapless Driving Tech Report

    Authors: Zhongyu Yang, Mai Liu, **luo Xie, Yueming Zhang, Chen Shen, Wei Shao, Jichao Jiao, Tengfei Xing, Runbo Hu, Pengfei Xu

    Abstract: Autonomous driving without high-definition (HD) maps demands a higher level of active scene understanding. In this competition, the organizers provided the multi-perspective camera images and standard-definition (SD) maps to explore the boundaries of scene reasoning capabilities. We found that most existing algorithms construct Bird's Eye View (BEV) features from these multi-perspective images and… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  31. arXiv:2406.09833  [pdf, other

    cs.AI cs.MM cs.SD eess.AS

    SHMamba: Structured Hyperbolic State Space Model for Audio-Visual Question Answering

    Authors: Zhe Yang, Wenrui Li, Guanghui Cheng

    Abstract: The Audio-Visual Question Answering (AVQA) task holds significant potential for applications. Compared to traditional unimodal approaches, the multi-modal input of AVQA makes feature extraction and fusion processes more challenging. Euclidean space is difficult to effectively represent multi-dimensional relationships of data. Especially when extracting and processing data with a tree structure or… ▽ More

    Submitted 20 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

  32. arXiv:2406.09792  [pdf, other

    cs.CV

    A Two-Stage Masked Autoencoder Based Network for Indoor Depth Completion

    Authors: Kailai Sun, Zhou Yang, Qianchuan Zhao

    Abstract: Depth images have a wide range of applications, such as 3D reconstruction, autonomous driving, augmented reality, robot navigation, and scene understanding. Commodity-grade depth cameras are hard to sense depth for bright, glossy, transparent, and distant surfaces. Although existing depth completion methods have achieved remarkable progress, their performance is limited when applied to complex ind… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted by 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop

  33. arXiv:2406.09215  [pdf, other

    cs.IR cs.AI

    On Softmax Direct Preference Optimization for Recommendation

    Authors: Yuxin Chen, Junfei Tan, An Zhang, Zhengyi Yang, Leheng Sheng, Enzhi Zhang, Xiang Wang, Tat-Seng Chua

    Abstract: Recommender systems aim to predict personalized rankings based on user preference data. With the rise of Language Models (LMs), LM-based recommenders have been widely explored due to their extensive world knowledge and powerful reasoning abilities. Most of the LM-based recommenders convert historical interactions into language prompts, pairing with a positive item as the target response and fine-t… ▽ More

    Submitted 14 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  34. Blind Super-Resolution via Meta-learning and Markov Chain Monte Carlo Simulation

    Authors: **gyuan Xia, Zhixiong Yang, Shengxi Li, Shuanghui Zhang, Yaowen Fu, Deniz Gündüz, Xiang Li

    Abstract: Learning-based approaches have witnessed great successes in blind single image super-resolution (SISR) tasks, however, handcrafted kernel priors and learning based kernel priors are typically required. In this paper, we propose a Meta-learning and Markov Chain Monte Carlo (MCMC) based SISR approach to learn kernel priors from organized randomness. In concrete, a lightweight network is adopted as k… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: This paper has been accepted for publication in IEEE Transactions on Pattern Analysis and Machine Intelligence (2024)

  35. arXiv:2406.08407  [pdf, other

    cs.CV cs.AI cs.CL

    MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

    Authors: Xuehai He, Weixi Feng, Kaizhi Zheng, Yujie Lu, Wanrong Zhu, Jiachen Li, Yue Fan, Jianfeng Wang, Linjie Li, Zhengyuan Yang, Kevin Lin, William Yang Wang, Lijuan Wang, Xin Eric Wang

    Abstract: Multimodal Language Language Models (MLLMs) demonstrate the emerging abilities of "world models" -- interpreting and reasoning about complex real-world dynamics. To assess these abilities, we posit videos are the ideal medium, as they encapsulate rich representations of real-world dynamics and causalities. To this end, we introduce MMWorld, a new benchmark for multi-discipline, multi-faceted multi… ▽ More

    Submitted 13 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  36. arXiv:2406.07056  [pdf, other

    cs.CL

    Effectively Compress KV Heads for LLM

    Authors: Hao Yu, Zelan Yang, Shen Li, Yong Li, Jianxin Wu

    Abstract: The advent of pre-trained large language models (LLMs) has revolutionized various natural language processing tasks. These models predominantly employ an auto-regressive decoding mechanism that utilizes Key-Value (KV) caches to eliminate redundant calculations for previous tokens. Nevertheless, as context lengths and batch sizes increase, the linear expansion in memory footprint of KV caches becom… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  37. arXiv:2406.06890  [pdf, other

    cs.CV

    Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation

    Authors: Yuanhao Zhai, Kevin Lin, Zhengyuan Yang, Linjie Li, Jianfeng Wang, Chung-Ching Lin, David Doermann, Junsong Yuan, Lijuan Wang

    Abstract: Image diffusion distillation achieves high-fidelity generation with very few sampling steps. However, applying these techniques directly to video diffusion often results in unsatisfactory frame quality due to the limited visual quality in public video datasets. This affects the performance of both teacher and student video diffusion models. Our study aims to improve video diffusion distillation wh… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Project page: https://yhzhai.github.io/mcm/

  38. arXiv:2406.06474  [pdf, other

    cs.AI cs.CL

    Towards a Personal Health Large Language Model

    Authors: Justin Cosentino, Anastasiya Belyaeva, Xin Liu, Nicholas A. Furlotte, Zhun Yang, Chace Lee, Erik Schenck, Yojan Patel, Jian Cui, Logan Douglas Schneider, Robby Bryant, Ryan G. Gomes, Allen Jiang, Roy Lee, Yun Liu, Javier Perez, Jameson K. Rogers, Cathy Speed, Shyam Tailor, Megan Walker, Jeffrey Yu, Tim Althoff, Conor Heneghan, John Hernandez, Mark Malhotra , et al. (9 additional authors not shown)

    Abstract: In health, most large language model (LLM) research has focused on clinical tasks. However, mobile and wearable devices, which are rarely integrated into such tasks, provide rich, longitudinal data for personal health monitoring. Here we present Personal Health Large Language Model (PH-LLM), fine-tuned from Gemini for understanding and reasoning over numerical time-series personal health data. We… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 72 pages

  39. arXiv:2406.05988  [pdf, other

    cs.GT

    Sponsored Search Auction Design Beyond Single Utility Maximization

    Authors: Changfeng Xu, Chao Peng, Chenyang Xu, Zhengfeng Yang

    Abstract: Auction design for the modern advertising market has gained significant prominence in the field of game theory. With the recent rise of auto-bidding tools, an increasing number of advertisers in the market are utilizing these tools for auctions. The diverse array of auto-bidding tools has made auction design more challenging. Various types of bidders, such as quasi-linear utility maximizers and co… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: To appear in COCOON 2024

  40. arXiv:2406.05892  [pdf, other

    cs.CR cs.LG cs.SE

    Security Vulnerability Detection with Multitask Self-Instructed Fine-Tuning of Large Language Models

    Authors: Aidan Z. H. Yang, Haoye Tian, He Ye, Ruben Martins, Claire Le Goues

    Abstract: Software security vulnerabilities allow attackers to perform malicious activities to disrupt software operations. Recent Transformer-based language models have significantly advanced vulnerability detection, surpassing the capabilities of static analysis based deep learning models. However, language models trained solely on code tokens do not capture either the explanation of vulnerability type or… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  41. arXiv:2406.04681  [pdf, other

    math.OC cs.SC math.AG

    Whitney Stratification of Algebraic Boundaries of Convex Semi-algebraic Sets

    Authors: Zihao Dai, Zijia Li, Zhi-Hong Yang, Lihong Zhi

    Abstract: Algebraic boundaries of convex semi-algebraic sets are closely related to polynomial optimization problems. Building upon Rainer Sinn's work, we refine the stratification of iterated singular loci to a Whitney (a) stratification, which gives a list of candidates of varieties whose dual is an irreducible component of the algebraic boundary of the dual convex body. We also present an algorithm based… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    MSC Class: 52A99; 14N05; 14P10; 51N35; 14Q15

  42. arXiv:2406.03877  [pdf, other

    cs.RO cs.CV

    Bench2Drive: Towards Multi-Ability Benchmarking of Closed-Loop End-To-End Autonomous Driving

    Authors: Xiaosong Jia, Zhenjie Yang, Qifeng Li, Zhiyuan Zhang, Junchi Yan

    Abstract: In an era marked by the rapid scaling of foundation models, autonomous driving technologies are approaching a transformative threshold where end-to-end autonomous driving (E2E-AD) emerges due to its potential of scaling up in the data-driven manner. However, existing E2E-AD methods are mostly evaluated under the open-loop log-replay manner with L2 errors and collision rate as metrics (e.g., in nuS… ▽ More

    Submitted 11 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: Fix typos in text and Table 4. More reference

  43. arXiv:2406.03249  [pdf, other

    cs.LG

    Near-field Beamforming for Extremely Large-scale MIMO Based on Unsupervised Deep Learning

    Authors: Jiali Nie, Yuanhao Cui, Zhaohui Yang, Weijie Yuan, Xiaojun **g

    Abstract: Extremely Large-scale Array (ELAA) is considered a frontier technology for future communication systems, pivotal in improving wireless systems' rate and spectral efficiency. However, as ELAA employs a multitude of antennas operating at higher frequencies, users are typically situated in the near-field region where the spherical wavefront propagates. This inevitably leads to a significant increase… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  44. arXiv:2406.03159  [pdf, other

    cs.NI cs.DC

    Hurry: Dynamic Collaborative Framework For Low-orbit Mega-Constellation Data Downloading

    Authors: Handong Luo, Wenhao Liu, Qi Zhang, Ziheng Yang, Quanwei Lin, Wenjun Zhu, Kun Qiu, Zhe Chen, Yue Gao

    Abstract: Low-orbit mega-constellation network, which utilize thousands of satellites to provide a variety of network services and collect a wide range of space information, is a rapidly growing field. Each satellite collects TB-level data daily, including delay-sensitive data used for crucial tasks, such as military surveillance, natural disaster monitoring, and weather forecasting. According to NASA's sta… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 15 pages, 7 figures

  45. arXiv:2406.03143  [pdf, other

    cs.CV cs.CR

    ZeroPur: Succinct Training-Free Adversarial Purification

    Authors: Xiuli Bi, Zonglin Yang, Bo Liu, Xiaodong Cun, Chi-Man Pun, Pietro Lio, Bin Xiao

    Abstract: Adversarial purification is a kind of defense technique that can defend various unseen adversarial attacks without modifying the victim classifier. Existing methods often depend on external generative models or cooperation between auxiliary functions and victim classifiers. However, retraining generative models, auxiliary functions, or victim classifiers relies on the domain of the fine-tuned data… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 16 pages, 5 figures, under review

  46. arXiv:2406.02865  [pdf

    cs.RO

    Dynamically Expanding Capacity of Autonomous Driving with Near-Miss Focused Training Framework

    Authors: Ziyuan Yang, Zhaoyang Li, Jianming Hu, Yi Zhang

    Abstract: The long-tail distribution of real driving data poses challenges for training and testing autonomous vehicles (AV), where rare yet crucial safety-critical scenarios are infrequent. And virtual simulation offers a low-cost and efficient solution. This paper proposes a near-miss focused training framework for AV. Utilizing the driving scenario information provided by sensors in the simulator, we des… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  47. arXiv:2406.02642  [pdf, other

    cs.LG cs.AI

    E-ICL: Enhancing Fine-Grained Emotion Recognition through the Lens of Prototype Theory

    Authors: Zhou Yang, Zhaochun Ren, Chenglong Ye, Yufeng Wang, Haizhou Sun, Chao Chen, Xiaofei Zhu, Yunbing Wu, Xiangwen Liao

    Abstract: In-context learning (ICL) achieves remarkable performance in various domains such as knowledge acquisition, commonsense reasoning, and semantic understanding. However, its performance significantly deteriorates for emotion detection tasks, especially fine-grained emotion recognition. The underlying reasons for this remain unclear. In this paper, we identify the reasons behind ICL's poor performanc… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 16 pages, 7 figures, 5 tables

  48. arXiv:2406.01940  [pdf, other

    cs.CL cs.LG cs.LO

    Process-Driven Autoformalization in Lean 4

    Authors: Jianqiao Lu, Zhengying Liu, Yingjia Wan, Yinya Huang, Haiming Wang, Zhicheng Yang, **g Tang, Zhijiang Guo

    Abstract: Autoformalization, the conversion of natural language mathematics into formal languages, offers significant potential for advancing mathematical reasoning. However, existing efforts are limited to formal languages with substantial online corpora and struggle to keep pace with rapidly evolving languages like Lean 4. To bridge this gap, we propose a new benchmark \textbf{Form}alization for \textbf{L… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 22 pages, 1 figures, 11 tables

  49. arXiv:2406.01579  [pdf, other

    cs.CV

    Tetrahedron Splatting for 3D Generation

    Authors: Chun Gu, Zeyu Yang, Zijie Pan, Xiatian Zhu, Li Zhang

    Abstract: 3D representation is essential to the significant advance of 3D generation with 2D diffusion priors. As a flexible representation, NeRF has been first adopted for 3D representation. With density-based volumetric rendering, it however suffers both intensive computational overhead and inaccurate mesh extraction. Using a signed distance field and Marching Tetrahedra, DMTet allows for precise mesh ext… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Code: https://github.com/fudan-zvg/tet-splatting

  50. arXiv:2406.01179  [pdf, other

    cs.CL cs.AI

    Are AI-Generated Text Detectors Robust to Adversarial Perturbations?

    Authors: Guanhua Huang, Yuchen Zhang, Zhe Li, Yongjian You, Mingze Wang, Zhouwang Yang

    Abstract: The widespread use of large language models (LLMs) has sparked concerns about the potential misuse of AI-generated text, as these models can produce content that closely resembles human-generated text. Current detectors for AI-generated text (AIGT) lack robustness against adversarial perturbations, with even minor changes in characters or words causing a reversal in distinguishing between human-cr… ▽ More

    Submitted 26 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024 main conference