Skip to main content

Showing 1–50 of 1,127 results for author: Shi, Z

.
  1. arXiv:2407.00987  [pdf, other

    cs.NI eess.SY

    Exploiting Dependency-Aware Priority Adjustment for Mixed-Criticality TSN Flow Scheduling

    Authors: Miao Guo, Yifei Sun, Chaojie Gu, Shibo He, Zhiguo Shi

    Abstract: Time-Sensitive Networking (TSN) serves as a one-size-fits-all solution for mixed-criticality communication, in which flow scheduling is vital to guarantee real-time transmissions. Traditional approaches statically assign priorities to flows based on their associated applications, resulting in significant queuing delays. In this paper, we observe that assigning different priorities to a flow leads… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: This paper has been accepted by IWQoS'24

  2. arXiv:2406.19043  [pdf

    eess.IV cs.AI cs.CV cs.DB

    CMRxRecon2024: A Multi-Modality, Multi-View K-Space Dataset Boosting Universal Machine Learning for Accelerated Cardiac MRI

    Authors: Zi Wang, Fanwen Wang, Chen Qin, Jun Lyu, Ouyang Cheng, Shuo Wang, Yan Li, Mengyao Yu, Haoyu Zhang, Kunyuan Guo, Zhang Shi, Qirong Li, Ziqiang Xu, Ya**g Zhang, Hao Li, Sha Hua, Binghua Chen, Longyu Sun, Mengting Sun, Qin Li, Ying-Hua Chu, Wenjia Bai, **g Qin, Xiahai Zhuang, Claudia Prieto , et al. (7 additional authors not shown)

    Abstract: Cardiac magnetic resonance imaging (MRI) has emerged as a clinically gold-standard technique for diagnosing cardiac diseases, thanks to its ability to provide diverse information with multiple modalities and anatomical views. Accelerated cardiac MRI is highly expected to achieve time-efficient and patient-friendly imaging, and then advanced image reconstruction approaches are required to recover h… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 19 pages, 3 figures, 2 tables

  3. arXiv:2406.18993  [pdf, ps, other

    eess.SP

    Interference Cancellation Based Neural Receiver for Superimposed Pilot in Multi-Layer Transmission

    Authors: Han Xiao, Wenqiang Tian, Shi **, Wendong Liu, Jia Shen, Zhihua Shi, Zhi Zhang

    Abstract: In this paper, an interference cancellation based neural receiver for superimposed pilot (SIP) in multi-layer transmission is proposed, where the data and pilot are non-orthogonally superimposed in the same time-frequency resource. Specifically, to deal with the intra-layer and inter-layer interference of SIP under multi-layer transmission, the interference cancellation with superimposed symbol ai… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  4. arXiv:2406.17803  [pdf, other

    cs.CL cs.AI cs.IR

    Understanding the Role of User Profile in the Personalization of Large Language Models

    Authors: Bin Wu, Zhengyan Shi, Hossein A. Rahmani, Varsha Ramineni, Emine Yilmaz

    Abstract: Utilizing user profiles to personalize Large Language Models (LLMs) has been shown to enhance the performance on a wide range of tasks. However, the precise role of user profiles and their effect mechanism on LLMs remains unclear. This study first confirms that the effectiveness of user profiles is primarily due to personalization information rather than semantic information. Furthermore, we inves… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  5. arXiv:2406.16583  [pdf, other

    cs.LG cs.CV

    Personalized federated learning based on feature fusion

    Authors: Wolong Xing, Zhenkui Shi, Hongyan Peng, Xiantao Hu, Xianxian Li

    Abstract: Federated learning enables distributed clients to collaborate on training while storing their data locally to protect client privacy. However, due to the heterogeneity of data, models, and devices, the final global model may need to perform better for tasks on each client. Communication bottlenecks, data heterogeneity, and model heterogeneity have been common challenges in federated learning. In t… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  6. arXiv:2406.16285  [pdf, other

    math.NA math.OC

    Gradient enhanced ADMM Algorithm for dynamic optimal transport on surfaces

    Authors: Guozhi Dong, Hailong Guo, Chengrun Jiang, Zuoqiang Shi

    Abstract: A gradient enhanced ADMM algorithm for optimal transport on general surfaces is proposed in this paper. Based on Benamou and Brenier's dynamical formulation, we combine gradient recovery techniques on surfaces with the ADMM algorithm, not only improving the computational accuracy, but also providing a novel method to deal with dual variables in the algorithm. This method avoids the use of stagger… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    MSC Class: 65M22; 49M41

  7. arXiv:2406.14891  [pdf, other

    cs.CL cs.IR

    Generate-then-Ground in Retrieval-Augmented Generation for Multi-hop Question Answering

    Authors: Zhengliang Shi, Shuo Zhang, Weiwei Sun, Shen Gao, Pengjie Ren, Zhumin Chen, Zhaochun Ren

    Abstract: Multi-Hop Question Answering (MHQA) tasks present a significant challenge for large language models (LLMs) due to the intensive knowledge required. Current solutions, like Retrieval-Augmented Generation, typically retrieve potential documents from an external corpus to read an answer. However, the performance of this retrieve-then-read paradigm is constrained by the retriever and the inevitable no… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: ACL 2024 (main conference)

  8. arXiv:2406.14852  [pdf, other

    cs.CV cs.AI

    Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language Models

    Authors: Jiayu Wang, Yifei Ming, Zhenmei Shi, Vibhav Vineet, Xin Wang, Neel Joshi

    Abstract: Large language models (LLMs) and vision-language models (VLMs) have demonstrated remarkable performance across a wide range of tasks and domains. Despite this promise, spatial understanding and reasoning -- a fundamental component of human cognition -- remains under-explored. We develop novel benchmarks that cover diverse aspects of spatial reasoning such as relationship understanding, navigation,… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  9. arXiv:2406.14036  [pdf, other

    cs.LG cs.AI cs.CL

    Toward Infinite-Long Prefix in Transformer

    Authors: Jiuxiang Gu, Yingyu Liang, Zhenmei Shi, Zhao Song, Chiwun Yang

    Abstract: Prompting and contextual-based fine-tuning methods, which we call Prefix Learning, have been proposed to enhance the performance of language models on various downstream tasks that can match full parameter fine-tuning. There remains a limited theoretical understanding of how these methods work. In this paper, we aim to relieve this limitation by studying the learning ability of Prefix Learning fro… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  10. arXiv:2406.13975  [pdf, other

    cs.CL cs.AI

    MR-BEN: A Comprehensive Meta-Reasoning Benchmark for Large Language Models

    Authors: Zhongshen Zeng, Yinhong Liu, Yingjia Wan, **gyao Li, Pengguang Chen, Jianbo Dai, Yuxuan Yao, Rongwu Xu, Zehan Qi, Wanru Zhao, Linling Shen, Jianqiao Lu, Haochen Tan, Yukang Chen, Hao Zhang, Zhan Shi, Bailin Wang, Zhijiang Guo, Jiaya Jia

    Abstract: Large language models (LLMs) have shown increasing capability in problem-solving and decision-making, largely based on the step-by-step chain-of-thought reasoning processes. However, it has been increasingly challenging to evaluate the reasoning capability of LLMs. Concretely, existing outcome-based benchmarks begin to saturate and become less sufficient to monitor the progress. To this end, we pr… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  11. arXiv:2406.13317  [pdf, other

    cs.CV

    M4Fog: A Global Multi-Regional, Multi-Modal, and Multi-Stage Dataset for Marine Fog Detection and Forecasting to Bridge Ocean and Atmosphere

    Authors: Mengqiu Xu, Ming Wu, Kaixin Chen, Yixiang Huang, Mingrui Xu, Yujia Yang, Yiqing Feng, Yiying Guo, Bin Huang, Dongliang Chang, Zhenwei Shi, Chuang Zhang, Zhanyu Ma, Jun Guo

    Abstract: Marine fog poses a significant hazard to global ship**, necessitating effective detection and forecasting to reduce economic losses. In recent years, several machine learning (ML) methods have demonstrated superior detection accuracy compared to traditional meteorological methods. However, most of these works are developed on proprietary datasets, and the few publicly accessible datasets are oft… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  12. arXiv:2406.11596  [pdf, other

    cond-mat.mes-hall physics.app-ph

    A hybrid graphene-siliconnitride nanomembrane as a versatile and ultra-widely tunable mechanical device

    Authors: Mengqi Fu, Bojan Bošnjak, Zhan Shi, Jannik Dornseiff, Robert H. Blick, Elke Scheer, Fan Yang

    Abstract: Integration of 2D materials in nanoelectromechanical systems (NEMS) marries the robustness of silicon-based materials with exceptional electrical controllability in 2D materials, drastically enhancing system performance which now is the key for many advanced applications in nanotechnology. Here, we experimentally demonstrate and theoretically analyze a powerful on-chip graphene integrated NEMS dev… ▽ More

    Submitted 23 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  13. arXiv:2406.10678  [pdf, other

    cs.CV

    A Late-Stage Bitemporal Feature Fusion Network for Semantic Change Detection

    Authors: Chenyao Zhou, Haotian Zhang, Han Guo, Zhengxia Zou, Zhenwei Shi

    Abstract: Semantic change detection is an important task in geoscience and earth observation. By producing a semantic change map for each temporal phase, both the land use land cover categories and change information can be interpreted. Recently some multi-task learning based semantic change detection methods have been proposed to decompose the task into semantic segmentation and binary change detection sub… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  14. arXiv:2406.09687  [pdf

    cond-mat.mes-hall cond-mat.str-el

    Interplay between topology and correlations in the second moiré band of twisted bilayer MoTe2

    Authors: Fan Xu, Xumin Chang, Jiayong Xiao, Yixin Zhang, Feng Liu, Zheng Sun, Ning Mao, Nikolai Peshcherenko, Jiayi Li, Kenji Watanabe, Takashi Taniguchi, Bingbing Tong, Li Lu, **feng Jia, Dong Qian, Zhiwen Shi, Yang Zhang, Xiaoxue Liu, Shengwei Jiang, Tingxin Li

    Abstract: Topological flat bands formed in two-dimensional lattice systems offer unique opportunity to study the fractional phases of matter in the absence of an external magnetic field. Celebrated examples include fractional quantum anomalous Hall (FQAH) effects and fractional topological insulators. Recently, FQAH effects have been experimentally realized in both the twisted bilayer MoTe2 (tMoTe2) system… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  15. arXiv:2406.07650  [pdf, ps, other

    math.AP math.CV

    A unique continuation property for $|\overline \partial u| \leq V |u|$

    Authors: Ziming Shi

    Abstract: Let $u: Ω\subset \mathbb C^n \to \mathbb C^m$, for $n \geq 2$ and $m \geq 1$. Let $1 \leq p \leq 2$, and $2(2n)^2 -1 \leq q < \infty$ such that $\displaystyle \frac{1}{p} + \frac{1}{p'} = 1$ and $\displaystyle \frac{1}{p} - \frac{1}{p'} = \frac{1}{q}$. Suppose $|\overline \partial u| \leq V |u|$, where $V \in L^q_{\operatorname{loc}}(Ω)$. Then $u$ has a unique continuation property in the followin… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 28 pages

  16. arXiv:2406.05628  [pdf, other

    cs.LG

    Domain Generalization Guided by Large-Scale Pre-Trained Priors

    Authors: Zongbin Wang, Bin Pan, Shiyu Shen, Tianyang Shi, Zhenwei Shi

    Abstract: Domain generalization (DG) aims to train a model from limited source domains, allowing it to generalize to unknown target domains. Typically, DG models only employ large-scale pre-trained models during the initialization of fine-tuning. However, large-scale pre-trained models already possess the ability to resist domain shift. If we reference pre-trained models continuously during fine-tuning to m… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  17. arXiv:2406.05616  [pdf, other

    cs.LG

    Domain Agnostic Conditional Invariant Predictions for Domain Generalization

    Authors: Zongbin Wang, Bin Pan, Zhenwei Shi

    Abstract: Domain generalization aims to develop a model that can perform well on unseen target domains by learning from multiple source domains. However, recent-proposed domain generalization models usually rely on domain labels, which may not be available in many real-world scenarios. To address this challenge, we propose a Discriminant Risk Minimization (DRM) theory and the corresponding algorithm to capt… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  18. arXiv:2406.05112  [pdf

    cond-mat.mes-hall physics.optics

    Ohms law lost and regained: observation and impact of zeros and poles

    Authors: Krishna Joshi, Israel Kurtz, Zhou Shi, Azriel Z. Genack

    Abstract: The quantum conductance and its classical wave analogue, the transmittance, are given by the sum of the eigenvalues of the transmission matrix. The lowest transmission eigenvalue in diffusive media might be expected to play a negligible role in the conductance, and, in any case, to be too small to be observed. Here, we observe the lowest transmission eigenchannel in microwave waveguides, though it… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  19. arXiv:2406.04207  [pdf, other

    cs.CV

    CDMamba: Remote Sensing Image Change Detection with Mamba

    Authors: Haotian Zhang, Keyan Chen, Chenyang Liu, Hao Chen, Zhengxia Zou, Zhenwei Shi

    Abstract: Recently, the Mamba architecture based on state space models has demonstrated remarkable performance in a series of natural language processing tasks and has been rapidly applied to remote sensing change detection (CD) tasks. However, most methods enhance the global receptive field by directly modifying the scanning mode of Mamba, neglecting the crucial role that local information plays in dense p… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  20. arXiv:2406.02037  [pdf

    cs.CV

    Multi-Scale Direction-Aware Network for Infrared Small Target Detection

    Authors: **miao Zhao, Zelin Shi, Chuang Yu, Yunpeng Liu

    Abstract: Infrared small target detection faces the problem that it is difficult to effectively separate the background and the target. Existing deep learning-based methods focus on appearance features and ignore high-frequency directional features. Therefore, we propose a multi-scale direction-aware network (MSDA-Net), which is the first attempt to integrate the high-frequency directional features of infra… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  21. arXiv:2406.00738  [pdf, other

    cs.LG cs.AI cs.CY

    Global Rewards in Restless Multi-Armed Bandits

    Authors: Naveen Raman, Zheyuan Ryan Shi, Fei Fang

    Abstract: Restless multi-armed bandits (RMAB) extend multi-armed bandits so pulling an arm impacts future states. Despite the success of RMABs, a key limiting assumption is the separability of rewards into a sum across arms. We address this deficiency by proposing restless-multi-armed bandit with global rewards (RMAB-G), a generalization of RMABs to global non-separable rewards. To solve RMAB-G, we develop… ▽ More

    Submitted 7 June, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

    Comments: 27 pages

  22. arXiv:2405.21063  [pdf, other

    cs.LG cs.AI

    Neural Network Verification with Branch-and-Bound for General Nonlinearities

    Authors: Zhouxing Shi, Qirui **, Zico Kolter, Suman Jana, Cho-Jui Hsieh, Huan Zhang

    Abstract: Branch-and-bound (BaB) is among the most effective methods for neural network (NN) verification. However, existing works on BaB have mostly focused on NNs with piecewise linear activations, especially ReLU networks. In this paper, we develop a general framework, named GenBaB, to conduct BaB for general nonlinearities in general computational graphs based on linear bound propagation. To decide whic… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: Preprint

  23. arXiv:2405.19592  [pdf, other

    cs.LG cs.AI cs.CL

    Why Larger Language Models Do In-context Learning Differently?

    Authors: Zhenmei Shi, Junyi Wei, Zhuoyan Xu, Yingyu Liang

    Abstract: Large language models (LLM) have emerged as a powerful tool for AI, with the key ability of in-context learning (ICL), where they can perform well on unseen tasks based on a brief series of task examples without necessitating any adjustments to the model parameters. One recent interesting mysterious observation is that models of different scales may have different ICL behaviors: larger models tend… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  24. arXiv:2405.17193  [pdf, other

    cs.GR

    Anisotropic Gauss Reconstruction for Unoriented Point Clouds

    Authors: Yueji Ma, Dong Xiao, Zuoqiang Shi, Bin Wang

    Abstract: Unoriented surface reconstructions based on the Gauss formula have attracted much attention due to their elegant mathematical formulation and excellent performance. However, the isotropic characteristics of the formulation limit their capacity to leverage the anisotropic information within the point cloud. In this work, we propose a novel anisotropic formulation by introducing a convection term in… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 17pages;14figures

  25. arXiv:2405.16634  [pdf, other

    cs.GR

    Fast and Globally Consistent Normal Orientation based on the Winding Number Normal Consistency

    Authors: Siyou Lin, Zuoqiang Shi, Yebin Liu

    Abstract: Estimating a consistently oriented normal vector field for an unoriented point cloud enables a number of important downstream applications in computer graphics. While normal estimation for a small patch of points can be done with simple techniques like principal component analysis (PCA), orienting these normals to be globally consistent has been a notoriously difficult problem. Some recent methods… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  26. arXiv:2405.16533  [pdf, other

    cs.CL

    Chain of Tools: Large Language Model is an Automatic Multi-tool Learner

    Authors: Zhengliang Shi, Shen Gao, Xiuyi Chen, Yue Feng, Lingyong Yan, Haibo Shi, Dawei Yin, Zhumin Chen, Suzan Verberne, Zhaochun Ren

    Abstract: Augmenting large language models (LLMs) with external tools has emerged as a promising approach to extend their utility, empowering them to solve practical tasks. Existing work typically empowers LLMs as tool users with a manually designed workflow, where the LLM plans a series of tools in a step-by-step manner, and sequentially executes each tool to obtain intermediate results until deriving the… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: Work in progress

  27. arXiv:2405.16418  [pdf, other

    cs.LG cs.AI cs.CV

    Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective

    Authors: Jiuxiang Gu, Yingyu Liang, Zhenmei Shi, Zhao Song, Yufa Zhou

    Abstract: Diffusion models have made rapid progress in generating high-quality samples across various domains. However, a theoretical understanding of the Lipschitz continuity and second momentum properties of the diffusion process is still lacking. In this paper, we bridge this gap by providing a detailed examination of these smoothness properties for the case where the target data distribution is a mixtur… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  28. arXiv:2405.16411  [pdf, other

    cs.LG cs.AI cs.CL

    Tensor Attention Training: Provably Efficient Learning of Higher-order Transformers

    Authors: Jiuxiang Gu, Yingyu Liang, Zhenmei Shi, Zhao Song, Yufa Zhou

    Abstract: Tensor Attention, a multi-view attention that is able to capture high-order correlations among multiple modalities, can overcome the representational limitations of classical matrix attention. However, the $Ω(n^3)$ time complexity of tensor attention poses a significant obstacle to its practical implementation in transformers, where $n$ is the input sequence length. In this work, we prove that the… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  29. arXiv:2405.14602  [pdf, other

    cs.LG

    Controllable Continual Test-Time Adaptation

    Authors: Ziqi Shi, Fan Lyu, Ye Liu, Fanhua Shang, Fuyuan Hu, Wei Feng, Zhang Zhang, Liang Wang

    Abstract: Continual Test-Time Adaptation (CTTA) is an emerging and challenging task where a model trained in a source domain must adapt to continuously changing conditions during testing, without access to the original source data. CTTA is prone to error accumulation due to uncontrollable domain shifts, leading to blurred decision boundaries between categories. Existing CTTA methods primarily focus on suppr… ▽ More

    Submitted 28 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  30. arXiv:2405.14394  [pdf, other

    cs.CL cs.AI

    Instruction Tuning With Loss Over Instructions

    Authors: Zhengyan Shi, Adam X. Yang, Bin Wu, Laurence Aitchison, Emine Yilmaz, Aldo Lipani

    Abstract: Instruction tuning plays a crucial role in sha** the outputs of language models (LMs) to desired styles. In this work, we propose a simple yet effective method, Instruction Modelling (IM), which trains LMs by applying a loss function to the instruction and prompt part rather than solely to the output part. Through experiments across 21 diverse benchmarks, we show that, in many scenarios, IM can… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: Code is available at https://github.com/ZhengxiangShi/InstructionModelling

  31. arXiv:2405.13570  [pdf, other

    cs.CV

    MetaEarth: A Generative Foundation Model for Global-Scale Remote Sensing Image Generation

    Authors: Zhi** Yu, Chenyang Liu, Liqin Liu, Zhenwei Shi, Zhengxia Zou

    Abstract: The recent advancement of generative foundational models has ushered in a new era of image generation in the realm of natural images, revolutionizing art design, entertainment, environment simulation, and beyond. Despite producing high-quality samples, existing methods are constrained to generating images of scenes at a limited scale. In this paper, we present MetaEarth, a generative foundation mo… ▽ More

    Submitted 28 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

    Comments: Project page: https://jiupinjia.github.io/metaearth/

  32. arXiv:2405.09964  [pdf, other

    cs.CV

    KPNDepth: Depth Estimation of Lane Images under Complex Rainy Environment

    Authors: Zhengxu Shi

    Abstract: With the development of deep neural network generative models in recent years, significant progress has been made in the research of depth estimation in lane scenes. However, current research achievements are mainly focused on clear daytime scenarios. In complex rainy environments, the influence of rain streaks and local fog effects often leads to erroneous increases in the overall depth estimatio… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  33. arXiv:2405.09820  [pdf, other

    cs.LG cs.CV

    Densely Distilling Cumulative Knowledge for Continual Learning

    Authors: Zenglin Shi, Pei Liu, Tong Su, Yunpeng Wu, Kuien Liu, Yu Song, Meng Wang

    Abstract: Continual learning, involving sequential training on diverse tasks, often faces catastrophic forgetting. While knowledge distillation-based approaches exhibit notable success in preventing forgetting, we pinpoint a limitation in their ability to distill the cumulative knowledge of all the previous tasks. To remedy this, we propose Dense Knowledge Distillation (DKD). DKD uses a task pool to track t… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: 12 pages; Continual Leanrning; Class-incremental Learning; Knowledge Distillation; Forgetting

  34. arXiv:2405.09256  [pdf, other

    physics.flu-dyn physics.data-an

    Drag prediction of rough-wall turbulent flow using data-driven regression

    Authors: Zhaoyu Shi, Seyed Morteza Habibi Khorasani, Heesoo Shin, Jiasheng Yang, Sangseung Lee, Shervin Bagheri

    Abstract: Efficient tools for predicting the drag of rough walls in turbulent flows would have a tremendous impact. However, methods for drag prediction rely on experiments or numerical simulations which are costly and time-consuming. Data-driven regression methods have the potential to provide a prediction that is accurate and fast. We assess the performance and limitations of linear regression, kernel met… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: This manuscript consists of 18 pages, where 2 appendices, 11 figures and 3 tables are included. It is currently under review at FLOW journal. Dr. Zhaoyu Shi developed the machine learning models and conducted the data analysis as well as the draft writing. The direct numerical simulations (see Appendix A) were conducted by Dr S.M.H. Khorasani

  35. Enhancing Function Name Prediction using Votes-Based Name Tokenization and Multi-Task Learning

    Authors: Xiaoling Zhang, Zhengzi Xu, Shouguo Yang, Zhi Li, Zhiqiang Shi, Limin Sun

    Abstract: Reverse engineers would acquire valuable insights from descriptive function names, which are absent in publicly released binaries. Recent advances in binary function name prediction using data-driven machine learning show promise. However, existing approaches encounter difficulties in capturing function semantics in diverse optimized binaries and fail to reserve the meaning of labels in function n… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: 24 pages, 10 figures, ACM ESEC/FSE 2024

    Journal ref: Proc. ACM Softw. Eng. 1,FSE, Article 75 (July 2024), 24 pages

  36. arXiv:2405.09071  [pdf, other

    physics.flu-dyn

    Data-driven discovery of drag-inducing elements on a rough surface through convolutional neural networks

    Authors: Heesoo Shin, Seyed Morteza Habibi Khorasani, Zhaoyu Shi, Jiasheng Yang, Sangseung Lee, Shervin Bagheri

    Abstract: Understanding the influence of surface roughness on drag forces remains a significant challenge in fluid dynamics. This paper presents a convolutional neural network (CNN) that predicts drag solely by the topography of rough surfaces and is capable of discovering spatial patterns linked to drag-inducing structures. A CNN model was developed to analyze spatial information from the topography of a r… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  37. arXiv:2405.06683  [pdf, other

    cs.CL

    ERAGent: Enhancing Retrieval-Augmented Language Models with Improved Accuracy, Efficiency, and Personalization

    Authors: Yunxiao Shi, Xing Zi, Zi**g Shi, Haimin Zhang, Qiang Wu, Min Xu

    Abstract: Retrieval-augmented generation (RAG) for language models significantly improves language understanding systems. The basic retrieval-then-read pipeline of response generation has evolved into a more extended process due to the integration of various components, sometimes even forming loop structures. Despite its advancements in improving response accuracy, challenges like poor retrieval quality for… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: Draft Paper

  38. arXiv:2405.05219  [pdf, other

    cs.LG cs.AI

    Conv-Basis: A New Paradigm for Efficient Attention Inference and Gradient Computation in Transformers

    Authors: Jiuxiang Gu, Yingyu Liang, Heshan Liu, Zhenmei Shi, Zhao Song, Junze Yin

    Abstract: Large Language Models (LLMs) have profoundly changed the world. Their self-attention mechanism is the key to the success of transformers in LLMs. However, the quadratic computational cost $O(n^2)$ to the length $n$ input sequence is the notorious obstacle for further improvement and scalability in the longer context. In this work, we leverage the convolution-like structure of attention matrices to… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: 55 pages

  39. arXiv:2405.04122  [pdf, other

    cs.LG cs.DC

    Ranking-based Client Selection with Imitation Learning for Efficient Federated Learning

    Authors: Chunlin Tian, Zhan Shi, Xinpeng Qin, Li Li, Chengzhong Xu

    Abstract: Federated Learning (FL) enables multiple devices to collaboratively train a shared model while ensuring data privacy. The selection of participating devices in each training round critically affects both the model performance and training efficiency, especially given the vast heterogeneity in training capabilities and data distribution across devices. To address these challenges, we introduce a no… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Accepted by ICML 2024

  40. arXiv:2405.03251  [pdf, ps, other

    cs.LG cs.AI

    Exploring the Frontiers of Softmax: Provable Optimization, Applications in Diffusion Model, and Beyond

    Authors: Jiuxiang Gu, Chenyang Li, Yingyu Liang, Zhenmei Shi, Zhao Song

    Abstract: The softmax activation function plays a crucial role in the success of large language models (LLMs), particularly in the self-attention mechanism of the widely adopted Transformer architecture. However, the underlying learning dynamics that contribute to the effectiveness of softmax remain largely unexplored. As a step towards better understanding, this paper provides a theoretical study of the op… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 53 pages

  41. arXiv:2405.02812  [pdf, other

    quant-ph

    Neural Network Enhanced Single-Photon Fock State Tomography

    Authors: Hsien-Yi Hsieh, Yi-Ru Chen, **gyu Ning, Hsun-Chung Wu, Hua Li Chen, Zi-Hao Shi, Po-Han Wang, Ole Steuernagel, Chien-Ming Wu, Ray-Kuang Lee

    Abstract: Even though heralded single-photon sources have been generated routinely through the spontaneous parametric down conversion, vacuum and multiple photon states are unavoidably involved. With machine-learning, we report the experimental implementation of single-photon quantum state tomography by directly estimating target parameters. Compared to the Hanbury Brown and Twiss (HBT) measurements only wi… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: 8 pages, 8 figures

  42. arXiv:2404.19608  [pdf, other

    cond-mat.quant-gas

    Three-dimensional Moiré Crystal

    Authors: Ce Wang, Chao Gao, **g Zhang, Hui Zhai, Zhe-Yu Shi

    Abstract: The work intends to extend the moiré physics to three dimensions. Three-dimensional moiré patterns can be realized in ultracold atomic gases by coupling two spin states in spin-dependent optical lattices with a relative twist, a structure currently unachievable in solid-state materials. We give the commensurate conditions under which the three-dimensional moiré pattern features a periodic structur… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  43. arXiv:2404.19245  [pdf, other

    cs.CL cs.AI

    HydraLoRA: An Asymmetric LoRA Architecture for Efficient Fine-Tuning

    Authors: Chunlin Tian, Zhan Shi, Zhijiang Guo, Li Li, Chengzhong Xu

    Abstract: Adapting Large Language Models (LLMs) to new tasks through fine-tuning has been made more efficient by the introduction of Parameter-Efficient Fine-Tuning (PEFT) techniques, such as LoRA. However, these methods often underperform compared to full fine-tuning, particularly in scenarios involving complex datasets. This issue becomes even more pronounced in complex domains, highlighting the need for… ▽ More

    Submitted 23 May, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

    Comments: 19 pages, 7 figures

  44. arXiv:2404.18955  [pdf, other

    cs.NE cs.AI

    GARA: A novel approach to Improve Genetic Algorithms' Accuracy and Efficiency by Utilizing Relationships among Genes

    Authors: Zhaoning Shi, Meng Xiang, Zhaoyang Hai, Xiabi Liu, Yan Pei

    Abstract: Genetic algorithms have played an important role in engineering optimization. Traditional GAs treat each gene separately. However, biophysical studies of gene regulatory networks revealed direct associations between different genes. It inspires us to propose an improvement to GA in this paper, Gene Regulatory Genetic Algorithm (GRGA), which, to our best knowledge, is the first time to utilize rela… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  45. arXiv:2404.18895  [pdf, other

    cs.CV

    RSCaMa: Remote Sensing Image Change Captioning with State Space Model

    Authors: Chenyang Liu, Keyan Chen, Bowen Chen, Haotian Zhang, Zhengxia Zou, Zhenwei Shi

    Abstract: Remote Sensing Image Change Captioning (RSICC) aims to describe surface changes between multi-temporal remote sensing images in language, including the changed object categories, locations, and dynamics of changing objects (e.g., added or disappeared). This poses challenges to spatial and temporal modeling of bi-temporal features. Despite previous methods progressing in the spatial change percepti… ▽ More

    Submitted 21 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

  46. arXiv:2404.18848  [pdf, other

    cs.LG cs.AI cs.CL

    FeDeRA:Efficient Fine-tuning of Language Models in Federated Learning Leveraging Weight Decomposition

    Authors: Yuxuan Yan, Qianqian Yang, Shunpu Tang, Zhiguo Shi

    Abstract: Despite their exceptional performance on various tasks after fine-tuning, pre-trained language models (PLMs) face significant challenges due to growing privacy concerns with data in centralized training methods. We consider federated learning (FL) to fine-tune PLMs in this paper. However, the substantial number of parameters in PLMs poses significant difficulties for client devices with limited co… ▽ More

    Submitted 25 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

  47. arXiv:2404.18085  [pdf, other

    cs.CL

    CRE-LLM: A Domain-Specific Chinese Relation Extraction Framework with Fine-tuned Large Language Model

    Authors: Zhengpeng Shi, Haoran Luo

    Abstract: Domain-Specific Chinese Relation Extraction (DSCRE) aims to extract relations between entities from domain-specific Chinese text. Despite the rapid development of PLMs in recent years, especially LLMs, DSCRE still faces three core challenges: complex network structure design, poor awareness, and high consumption of fine-tuning. Given the impressive performance of large language models (LLMs) in na… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Comments: preprint

  48. arXiv:2404.18074  [pdf, other

    cs.AI cs.HC

    MMAC-Copilot: Multi-modal Agent Collaboration Operating System Copilot

    Authors: Zirui Song, Yaohang Li, Meng Fang, Zhenhao Chen, Zecheng Shi, Yuan Huang, Ling Chen

    Abstract: Autonomous virtual agents are often limited by their singular mode of interaction with real-world environments, restricting their versatility. To address this, we propose the Multi-Modal Agent Collaboration framework (MMAC-Copilot), a framework utilizes the collective expertise of diverse agents to enhance interaction ability with operating systems. The framework introduces a team collaboration ch… ▽ More

    Submitted 4 May, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

    Comments: In processing

  49. arXiv:2404.15284  [pdf, other

    eess.SP cs.AI

    Global 4D Ionospheric STEC Prediction based on DeepONet for GNSS Rays

    Authors: Dijia Cai, Zenghui Shi, Haiyang Fu, Huan Liu, Hongyi Qian, Yun Sui, Feng Xu, Ya-Qiu **

    Abstract: The ionosphere is a vitally dynamic charged particle region in the Earth's upper atmosphere, playing a crucial role in applications such as radio communication and satellite navigation. The Slant Total Electron Contents (STEC) is an important parameter for characterizing wave propagation, representing the integrated electron density along the ray of radio signals passing through the ionosphere. Th… ▽ More

    Submitted 12 March, 2024; originally announced April 2024.

  50. arXiv:2404.12814  [pdf, other

    cs.LG cs.AI cs.CV

    Generative Modelling with High-Order Langevin Dynamics

    Authors: Ziqiang Shi, Rujie Liu

    Abstract: Diffusion generative modelling (DGM) based on stochastic differential equations (SDEs) with score matching has achieved unprecedented results in data generation. In this paper, we propose a novel fast high-quality generative modelling method based on high-order Langevin dynamics (HOLD) with score matching. This motive is proved by third-order Langevin dynamics. By augmenting the previous SDEs, e.g… ▽ More

    Submitted 21 April, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

    Comments: Some of the results in this paper have been published or accepted at conferences such as wacv2024, icassp2024, and icme2024