Skip to main content

Showing 1–50 of 648 results for author: Yuan, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01527  [pdf, other

    cs.CL

    KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches

    Authors: Jiayi Yuan, Hongyi Liu, Shaochen, Zhong, Yu-Neng Chuang, Songchen Li, Guanchu Wang, Duy Le, Hongye **, Vipin Chaudhary, Zhaozhuo Xu, Zirui Liu, Xia Hu

    Abstract: Long context capability is a crucial competency for large language models (LLMs) as it mitigates the human struggle to digest long-form texts. This capability enables complex task-solving scenarios such as book summarization, code assistance, and many more tasks that are traditionally manpower-intensive. However, transformer-based LLMs face significant challenges with long context input due to the… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2407.00468  [pdf, other

    cs.CV cs.AI cs.CL

    MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation

    Authors: **sheng Huang, Liang Chen, Taian Guo, Fu Zeng, Yusheng Zhao, Bohan Wu, Ye Yuan, Haozhe Zhao, Zhihui Guo, Yichi Zhang, **gyang Yuan, Wei Ju, Luchen Liu, Tianyu Liu, Baobao Chang, Ming Zhang

    Abstract: Large Multimodal Models (LMMs) exhibit impressive cross-modal understanding and reasoning abilities, often assessed through multiple-choice questions (MCQs) that include an image, a question, and several options. However, many benchmarks used for such evaluations suffer from systematic biases. Remarkably, Large Language Models (LLMs) without any visual perception capabilities achieve non-trivial p… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: 21 pages, code released at https://github.com/chenllliang/MMEvalPro, Homepage at https://mmevalpro.github.io/

  3. arXiv:2406.19781  [pdf, other

    cs.RO

    LCSim: A Large-Scale Controllable Traffic Simulator

    Authors: Yuheng Zhang, Tianjian Ouyang, Fudan Yu, Cong Ma, Lei Qiao, Wei Wu, Jian Yuan, Yong Li

    Abstract: With the rapid development of urban transportation and the continuous advancement in autonomous vehicles, the demand for safely and efficiently testing autonomous driving and traffic optimization algorithms arises, which needs accurate modeling of large-scale urban traffic scenarios. Existing traffic simulation systems encounter two significant limitations. Firstly, they often rely on open-source… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: Submitted to the 38th Conference on Neural Information Processing Systems (NeurIPS 2024) Track on Datasets and Benchmarks

  4. arXiv:2406.19400  [pdf, other

    cs.CV

    Deep Convolutional Neural Networks Meet Variational Shape Compactness Priors for Image Segmentation

    Authors: Kehui Zhang, Lingfeng Li, Hao Liu, **g Yuan, Xue-Cheng Tai

    Abstract: Shape compactness is a key geometrical property to describe interesting regions in many image segmentation tasks. In this paper, we propose two novel algorithms to solve the introduced image segmentation problem that incorporates a shape-compactness prior. Existing algorithms for such a problem often suffer from computational inefficiency, difficulty in reaching a local minimum, and the need to fi… ▽ More

    Submitted 23 May, 2024; originally announced June 2024.

    Comments: 28 pages

  5. arXiv:2406.18548  [pdf

    eess.IV cs.CV

    Exploration of Multi-Scale Image Fusion Systems in Intelligent Medical Image Analysis

    Authors: Yuxiang Hu, Haowei Yang, Ting Xu, Shuyao He, Jiajie Yuan, Haozhang Deng

    Abstract: The diagnosis of brain cancer relies heavily on medical imaging techniques, with MRI being the most commonly used. It is necessary to perform automatic segmentation of brain tumors on MRI images. This project intends to build an MRI algorithm based on U-Net. The residual network and the module used to enhance the context information are combined, and the void space convolution pooling pyramid is a… ▽ More

    Submitted 23 May, 2024; originally announced June 2024.

  6. arXiv:2406.14045  [pdf, other

    cs.LG cs.AI

    Understanding Different Design Choices in Training Large Time Series Models

    Authors: Yu-Neng Chuang, Songchen Li, Jiayi Yuan, Guanchu Wang, Kwei-Herng Lai, Leisheng Yu, Sirui Ding, Chia-Yuan Chang, Qiaoyu Tan, Daochen Zha, Xia Hu

    Abstract: Inspired by Large Language Models (LLMs), Time Series Forecasting (TSF), a long-standing task in time series analysis, is undergoing a transition towards Large Time Series Models (LTSMs), aiming to train universal transformer-based models for TSF. However, training LTSMs on heterogeneous time series data poses unique challenges, including diverse frequencies, dimensions, and patterns across datase… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  7. arXiv:2406.13642  [pdf, other

    cs.CV

    SpatialBot: Precise Spatial Understanding with Vision Language Models

    Authors: Wenxiao Cai, Yaroslav Ponomarenko, Jianhao Yuan, Xiaoqi Li, Wankou Yang, Hao Dong, Bo Zhao

    Abstract: Vision Language Models (VLMs) have achieved impressive performance in 2D image understanding, however they are still struggling with spatial understanding which is the foundation of Embodied AI. In this paper, we propose SpatialBot for better spatial understanding by feeding both RGB and depth images. Additionally, we have constructed the SpatialQA dataset, which involves multi-level depth-related… ▽ More

    Submitted 27 June, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

  8. arXiv:2406.13586  [pdf, ps, other

    cs.GT cs.AI

    Submodular Participatory Budgeting

    Authors: **g Yuan, Shaojie Tang

    Abstract: Participatory budgeting refers to the practice of allocating public resources by collecting and aggregating individual preferences. Most existing studies in this field often assume an additive utility function, where each individual holds a private utility for each candidate project, and the total utility of a set of funded projects is simply the sum of the utilities of all projects. We argue that… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  9. arXiv:2406.11847  [pdf, other

    cs.CY cs.LG

    Integrating behavior analysis with machine learning to predict online learning performance: A scientometric review and empirical study

    Authors: ** Yuan, Xuelan Qiu, **ran Wu, Jiesi Guo, Weide Li, You-Gan Wang

    Abstract: The interest in predicting online learning performance using ML algorithms has been steadily increasing. We first conducted a scientometric analysis to provide a systematic review of research in this area. The findings show that most existing studies apply the ML methods without considering learning behavior patterns, which may compromise the prediction accuracy and precision of the ML methods. Th… ▽ More

    Submitted 27 March, 2024; originally announced June 2024.

    Comments: 23 pages, 12 figures, 9 tables. Submitted to Computer & Education; Authorship Contribution: Yuan: Literature review, Data curation, Methodology, Software. Qiu: Literature review, Conceptualization, Methodology, Original draft writing. Wu: Scientometric analysis, Methodology. Guo: Review and editing. Li: Comment draft, Funding seeking. Wang: Comment draft

  10. arXiv:2406.09870  [pdf, other

    cs.LG cs.AI

    IGL-Bench: Establishing the Comprehensive Benchmark for Imbalanced Graph Learning

    Authors: Jiawen Qin, Haonan Yuan, Qingyun Sun, Lyu** Xu, Jiaqi Yuan, Pengfeng Huang, Zhaonan Wang, Xingcheng Fu, Hao Peng, Jianxin Li, Philip S. Yu

    Abstract: Deep graph learning has gained grand popularity over the past years due to its versatility and success in representing graph data across a wide range of domains. However, the pervasive issue of imbalanced graph data distributions, where certain parts exhibit disproportionally abundant data while others remain sparse, undermines the efficacy of conventional graph learning algorithms, leading to bia… ▽ More

    Submitted 19 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Preprint, under review)

  11. arXiv:2406.08864  [pdf

    cs.LG cs.AI

    Research on Early Warning Model of Cardiovascular Disease Based on Computer Deep Learning

    Authors: Yuxiang Hu, **xin Hu, Ting Xu, Bo Zhang, Jiajie Yuan, Haozhang Deng

    Abstract: This project intends to study a cardiovascular disease risk early warning model based on one-dimensional convolutional neural networks. First, the missing values of 13 physiological and symptom indicators such as patient age, blood glucose, cholesterol, and chest pain were filled and Z-score was standardized. The convolutional neural network is converted into a 2D matrix, the convolution function… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 6 pages

  12. arXiv:2406.06890  [pdf, other

    cs.CV

    Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation

    Authors: Yuanhao Zhai, Kevin Lin, Zhengyuan Yang, Linjie Li, Jianfeng Wang, Chung-Ching Lin, David Doermann, Junsong Yuan, Lijuan Wang

    Abstract: Image diffusion distillation achieves high-fidelity generation with very few sampling steps. However, applying these techniques directly to video diffusion often results in unsatisfactory frame quality due to the limited visual quality in public video datasets. This affects the performance of both teacher and student video diffusion models. Our study aims to improve video diffusion distillation wh… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Project page: https://yhzhai.github.io/mcm/

  13. arXiv:2406.04776  [pdf, ps, other

    eess.SP cs.AI

    OFDM-Standard Compatible SC-NOFS Waveforms for Low-Latency and Jitter-Tolerance Industrial IoT Communications

    Authors: Tongyang Xu, Shuangyang Li, **hong Yuan

    Abstract: Traditional communications focus on regular and orthogonal signal waveforms for simplified signal processing and improved spectral efficiency. In contrast, the next-generation communications would aim for irregular and non-orthogonal signal waveforms to introduce new capabilities. This work proposes a spectrally efficient irregular Sinc (irSinc) sha** technique, revisiting the traditional Sinc b… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  14. arXiv:2406.03402  [pdf, other

    cs.LG cs.AI

    Mixed-Precision Over-The-Air Federated Learning via Approximated Computing

    Authors: **sheng Yuan, Zhuangkun Wei, Weisi Guo

    Abstract: Over-the-Air Federated Learning (OTA-FL) has been extensively investigated as a privacy-preserving distributed learning mechanism. Realistic systems will see FL clients with diverse size, weight, and power configurations. A critical research gap in existing OTA-FL research is the assumption of homogeneous client computational bit precision. Indeed, many clients may exploit approximate computing (A… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  15. arXiv:2406.02126  [pdf, other

    eess.SY cs.AI cs.LG cs.MA

    CityLight: A Universal Model Towards Real-world City-scale Traffic Signal Control Coordination

    Authors: **wei Zeng, Chao Yu, Xinyi Yang, Wenxuan Ao, Jian Yuan, Yong Li, Yu Wang, Huazhong Yang

    Abstract: Traffic signal control (TSC) is a promising low-cost measure to enhance transportation efficiency without affecting existing road infrastructure. While various reinforcement learning-based TSC methods have been proposed and experimentally outperform conventional rule-based methods, none of them has been deployed in the real world. An essential gap lies in the oversimplification of the scenarios in… ▽ More

    Submitted 6 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

  16. arXiv:2406.01903  [pdf, ps, other

    cs.IT

    Reverse PAC Codes: Look-ahead List Decoding

    Authors: Xinyi Gu, Mohammad Rowshan, **hong Yuan

    Abstract: Convolutional precoding in polarization-adjusted convolutional (PAC) codes is a recently introduced variant of polar codes. It has demonstrated an effective reduction in the number of minimum weight codewords (a.k.a error coefficient) of polar codes. This reduction has the potential to significantly improve the error correction performance. From a codeword formation perspective, this reduction has… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: To appear in the proceedings of ISIT'24. It contains 6 pages, 3 figures, and 1 table

  17. arXiv:2406.01900  [pdf, other

    cs.CV

    Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation

    Authors: Yue Ma, Hongyu Liu, Hongfa Wang, Heng Pan, Yingqing He, Junkun Yuan, Ailing Zeng, Chengfei Cai, Heung-Yeung Shum, Wei Liu, Qifeng Chen

    Abstract: We present Follow-Your-Emoji, a diffusion-based framework for portrait animation, which animates a reference portrait with target landmark sequences. The main challenge of portrait animation is to preserve the identity of the reference portrait and transfer the target expression to this portrait while maintaining temporal consistency and fidelity. To address these challenges, Follow-Your-Emoji equ… ▽ More

    Submitted 6 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: Project Page: https://follow-your-emoji.github.io/

  18. arXiv:2405.17708  [pdf, other

    cs.LG cs.AI stat.ML

    OPERA: Automatic Offline Policy Evaluation with Re-weighted Aggregates of Multiple Estimators

    Authors: Allen Nie, Yash Chandak, Christina J. Yuan, Anirudhan Badrinath, Yannis Flet-Berliac, Emma Brunskil

    Abstract: Offline policy evaluation (OPE) allows us to evaluate and estimate a new sequential decision-making policy's performance by leveraging historical interaction data collected from other policies. Evaluating a new policy online without a confident estimate of its performance can lead to costly, unsafe, or hazardous outcomes, especially in education and healthcare. Several OPE estimators have been pro… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 22 pages

  19. arXiv:2405.17460  [pdf

    cs.LG cs.AI cs.CV

    Investigation of Customized Medical Decision Algorithms Utilizing Graph Neural Networks

    Authors: Yafeng Yan, Shuyao He, Zhou Yu, Jiajie Yuan, Ziang Liu, Yan Chen

    Abstract: Aiming at the limitations of traditional medical decision system in processing large-scale heterogeneous medical data and realizing highly personalized recommendation, this paper introduces a personalized medical decision algorithm utilizing graph neural network (GNN). This research innovatively integrates graph neural network technology into the medical and health field, aiming to build a high-pr… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  20. arXiv:2405.15738  [pdf, other

    cs.CV

    ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models

    Authors: Chunjiang Ge, Sijie Cheng, Ziming Wang, Jiale Yuan, Yuan Gao, Jun Song, Shiji Song, Gao Huang, Bo Zheng

    Abstract: High-resolution Large Multimodal Models (LMMs) encounter the challenges of excessive visual tokens and quadratic visual complexity. Current high-resolution LMMs address the quadratic complexity while still generating excessive visual tokens. However, the redundancy in visual tokens is the key problem as it leads to more substantial compute. To mitigate this issue, we propose ConvLLaVA, which emplo… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 17 pages

  21. arXiv:2405.15199  [pdf, other

    cs.CV

    ODGEN: Domain-specific Object Detection Data Generation with Diffusion Models

    Authors: **gyuan Zhu, Shiyu Li, Yuxuan Liu, ** Huang, Jiulong Shan, Huimin Ma, Jian Yuan

    Abstract: Modern diffusion-based image generative models have made significant progress and become promising to enrich training data for the object detection task. However, the generation quality and the controllability for complex scenes containing multi-class objects and dense objects with occlusions remain limited. This paper presents ODGEN, a novel method to generate high-quality images conditioned on b… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  22. arXiv:2405.12868  [pdf, other

    cs.LG cs.AI

    Equivariant Spatio-Temporal Attentive Graph Networks to Simulate Physical Dynamics

    Authors: Liming Wu, Zhichao Hou, Jirui Yuan, Yu Rong, Wenbing Huang

    Abstract: Learning to represent and simulate the dynamics of physical systems is a crucial yet challenging task. Existing equivariant Graph Neural Network (GNN) based methods have encapsulated the symmetry of physics, \emph{e.g.}, translations, rotations, etc, leading to better generalization ability. Nevertheless, their frame-to-frame formulation of the task overlooks the non-Markov property mainly incurre… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: The paper has been published to the conference of NeurIPS 2023

  23. arXiv:2405.10640  [pdf, other

    cs.SI

    COMET: NFT Price Prediction with Wallet Profiling

    Authors: Tianfu Wang, Liwei Deng, Chao Wang, Jianxun Lian, Yue Yan, Nicholas **g Yuan, Qi Zhang, Hui Xiong

    Abstract: As the non-fungible token (NFT) market flourishes, price prediction emerges as a pivotal direction for investors gaining valuable insight to maximize returns. However, existing works suffer from a lack of practical definitions and standardized evaluations, limiting their practical application. Moreover, the influence of users' multi-behaviour transactions that are publicly accessible on NFT price… ▽ More

    Submitted 29 May, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

    Comments: Accepted by KDD 2024 (ADS Track)

  24. Defect Category Prediction Based on Multi-Source Domain Adaptation

    Authors: Ying Xing, Mengci Zhao, Bin Yang, Yuwei Zhang, Wen** Li, Jiawei Gu, Jun Yuan

    Abstract: In recent years, defect prediction techniques based on deep learning have become a prominent research topic in the field of software engineering. These techniques can identify potential defects without executing the code. However, existing approaches mostly concentrate on determining the presence of defects at the method-level code, lacking the ability to precisely classify specific defect categor… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: 17 pages, in Chinese language, 8 figures (Due to length constraints of the abstract field, please refer to the original PDF file for the full content of abstract.)

    Journal ref: Journal of Software [2024]

  25. arXiv:2405.10276  [pdf, other

    cs.CL cs.HC

    Revisiting OPRO: The Limitations of Small-Scale LLMs as Optimizers

    Authors: Tuo Zhang, **yue Yuan, Salman Avestimehr

    Abstract: Numerous recent works aim to enhance the efficacy of Large Language Models (LLMs) through strategic prompting. In particular, the Optimization by PROmpting (OPRO) approach provides state-of-the-art performance by leveraging LLMs as optimizers where the optimization task is to find instructions that maximize the task accuracy. In this paper, we revisit OPRO for automated prompting with relatively s… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Journal ref: ACL Findings 2024

  26. arXiv:2405.09819  [pdf

    cs.SE cs.LG

    Automating the Training and Deployment of Models in MLOps by Integrating Systems with Machine Learning

    Authors: Penghao Liang, Bo Song, Xiaoan Zhan, Zhou Chen, Jiaqiang Yuan

    Abstract: This article introduces the importance of machine learning in real-world applications and explores the rise of MLOps (Machine Learning Operations) and its importance for solving challenges such as model deployment and performance monitoring. By reviewing the evolution of MLOps and its relationship to traditional software development methods, the paper proposes ways to integrate the system into mac… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  27. arXiv:2405.08295  [pdf, other

    cs.CL cs.SD eess.AS

    SpeechVerse: A Large-scale Generalizable Audio Language Model

    Authors: Nilaksh Das, Saket Dingliwal, Srikanth Ronanki, Rohit Paturi, Zhaocheng Huang, Prashant Mathur, Jie Yuan, Dhanush Bekal, Xing Niu, Sai Muralidhar Jayanthi, Xilai Li, Karel Mundnich, Monica Sunkara, Sundararajan Srinivasan, Kyu J Han, Katrin Kirchhoff

    Abstract: Large language models (LLMs) have shown incredible proficiency in performing tasks that require semantic understanding of natural language instructions. Recently, many works have further expanded this capability to perceive multimodal audio and text inputs, but their capabilities are often limited to specific fine-tuned tasks such as automatic speech recognition and translation. We therefore devel… ▽ More

    Submitted 31 May, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

    Comments: Single Column, 13 page

  28. arXiv:2405.07547  [pdf, other

    cs.IT eess.SP

    Channel Coding Toward 6G: Technical Overview and Outlook

    Authors: Mohammad Rowshan, Min Qiu, Yixuan Xie, Xinyi Gu, **hong Yuan

    Abstract: Channel coding plays a pivotal role in ensuring reliable communication over wireless channels. With the growing need for ultra-reliable communication in emerging wireless use cases, the significance of channel coding has amplified. Furthermore, minimizing decoding latency is crucial for critical-mission applications, while optimizing energy efficiency is paramount for mobile and the Internet of Th… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: 102 pages, 87 figures, IEEE Open Journal of the Communications Society (invited paper)

  29. arXiv:2405.04434  [pdf, other

    cs.CL cs.AI

    DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

    Authors: DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Hao Yang, Haowei Zhang, Honghui Ding , et al. (132 additional authors not shown)

    Abstract: We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference… ▽ More

    Submitted 19 June, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

  30. arXiv:2405.03181  [pdf, other

    cs.DC

    Collaborative Satellite Computing through Adaptive DNN Task Splitting and Offloading

    Authors: Shifeng Peng, Xuefeng Hou, Zhishu Shen, Qiushi Zheng, Jiong **, Atsushi Tagami, **gling Yuan

    Abstract: Satellite computing has emerged as a promising technology for next-generation wireless networks. This innovative technology provides data processing capabilities, which facilitates the widespread implementation of artificial intelligence (AI)-based applications, especially for image processing tasks involving deep neural network (DNN). With the limited computing resources of an individual satellit… ▽ More

    Submitted 20 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

    Comments: Accepted by 29th IEEE Symposium on Computers and Communications (ISCC)

  31. arXiv:2404.18620  [pdf, other

    cs.CV

    FlexiFilm: Long Video Generation with Flexible Conditions

    Authors: Yichen Ouyang, jianhao Yuan, Hao Zhao, Gaoang Wang, Bo zhao

    Abstract: Generating long and consistent videos has emerged as a significant yet challenging problem. While most existing diffusion-based video generation models, derived from image generation models, demonstrate promising performance in generating short videos, their simple conditioning mechanism and sampling strategy-originally designed for image generation-cause severe performance degradation when adapte… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: 9 pages, 9 figures

  32. arXiv:2404.18419  [pdf

    cs.CV cs.AI

    Research on Intelligent Aided Diagnosis System of Medical Image Based on Computer Deep Learning

    Authors: Jiajie Yuan, Linxiao Wu, Yulu Gong, Zhou Yu, Ziang Liu, Shuyao He

    Abstract: This paper combines Struts and Hibernate two architectures together, using DAO (Data Access Object) to store and access data. Then a set of dual-mode humidity medical image library suitable for deep network is established, and a dual-mode medical image assisted diagnosis method based on the image is proposed. Through the test of various feature extraction methods, the optimal operating characteris… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  33. arXiv:2404.18409  [pdf, other

    cs.CV

    PKU-AIGIQA-4K: A Perceptual Quality Assessment Database for Both Text-to-Image and Image-to-Image AI-Generated Images

    Authors: Jiquan Yuan, Fanyi Yang, Jihe Li, Xinyan Cao, **ming Che, **long Lin, Xixin Cao

    Abstract: In recent years, image generation technology has rapidly advanced, resulting in the creation of a vast array of AI-generated images (AIGIs). However, the quality of these AIGIs is highly inconsistent, with low-quality AIGIs severely impairing the visual experience of users. Due to the widespread application of AIGIs, the AI-generated image quality assessment (AIGIQA), aimed at evaluating the quali… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Comments: 12 pages. arXiv admin note: substantial text overlap with arXiv:2311.15556

  34. arXiv:2404.16164  [pdf, other

    cs.CL cs.AI cs.LG

    Towards a Holistic Evaluation of LLMs on Factual Knowledge Recall

    Authors: Jiaqing Yuan, Lin Pan, Chung-Wei Hang, Jiang Guo, Jiarong Jiang, Bonan Min, Patrick Ng, Zhiguo Wang

    Abstract: Large language models (LLMs) have shown remarkable performance on a variety of NLP tasks, and are being rapidly adopted in a wide range of use cases. It is therefore of vital importance to holistically evaluate the factuality of their generated outputs, as hallucinations remain a challenging issue. In this work, we focus on assessing LLMs' ability to recall factual knowledge learned from pretrai… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  35. arXiv:2404.14688  [pdf, other

    cs.LG cs.AI cs.CE math.DS math.NA

    FMint: Bridging Human Designed and Data Pretrained Models for Differential Equation Foundation Model

    Authors: Zezheng Song, Jiaxin Yuan, Haizhao Yang

    Abstract: In this paper, we propose a pre-trained foundation model \textbf{FMint} (\textbf{F}oundation \textbf{M}odel based on \textbf{In}i\textbf{t}ialization), designed to speed up large-scale simulations of various differential equations with high accuracy via error correction. Human-designed simulation algorithms excel at capturing the fundamental physics of engineering problems, but often need to balan… ▽ More

    Submitted 22 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

  36. arXiv:2404.13311  [pdf, other

    cs.CV

    STAT: Towards Generalizable Temporal Action Localization

    Authors: Yangcen Liu, Ziyi Liu, Yuanhao Zhai, Wen Li, David Doerman, Junsong Yuan

    Abstract: Weakly-supervised temporal action localization (WTAL) aims to recognize and localize action instances with only video-level labels. Despite the significant progress, existing methods suffer from severe performance degradation when transferring to different distributions and thus may hardly adapt to real-world scenarios . To address this problem, we propose the Generalizable Temporal Action Localiz… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

    Comments: 14 pages, LaTeX;

  37. arXiv:2404.12633  [pdf, other

    cs.AI cs.NI

    FlagVNE: A Flexible and Generalizable Reinforcement Learning Framework for Network Resource Allocation

    Authors: Tianfu Wang, Qilin Fan, Chao Wang, Long Yang, Leilei Ding, Nicholas **g Yuan, Hui Xiong

    Abstract: Virtual network embedding (VNE) is an essential resource allocation task in network virtualization, aiming to map virtual network requests (VNRs) onto physical infrastructure. Reinforcement learning (RL) has recently emerged as a promising solution to this problem. However, existing RL-based VNE methods are limited by the unidirectional action design and one-size-fits-all training strategy, result… ▽ More

    Submitted 1 May, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

    Comments: Accepted by IJCAI 2024

  38. arXiv:2404.10934  [pdf, other

    cs.LG cs.AI cs.CL

    Shears: Unstructured Sparsity with Neural Low-rank Adapter Search

    Authors: J. Pablo Muñoz, **jie Yuan, Nilesh Jain

    Abstract: Recently, several approaches successfully demonstrated that weight-sharing Neural Architecture Search (NAS) can effectively explore a search space of elastic low-rank adapters (LoRA), allowing the parameter-efficient fine-tuning (PEFT) and compression of large language models. In this paper, we introduce a novel approach called Shears, demonstrating how the integration of cost-effective sparsity a… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics (Industry Track)

  39. arXiv:2404.10443  [pdf, ps, other

    cs.LG cs.AI

    AGHINT: Attribute-Guided Representation Learning on Heterogeneous Information Networks with Transformer

    Authors: **hui Yuan, Shan Lu, Peibo Duan, Jieyue He

    Abstract: Recently, heterogeneous graph neural networks (HGNNs) have achieved impressive success in representation learning by capturing long-range dependencies and heterogeneity at the node level. However, few existing studies have delved into the utilization of node attributes in heterogeneous information networks (HINs). In this paper, we investigate the impact of inter-node attribute disparities on HGNN… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: 9 pages, 5 figures

  40. arXiv:2404.09447  [pdf, other

    cs.CV cs.LG

    kNN-CLIP: Retrieval Enables Training-Free Segmentation on Continually Expanding Large Vocabularies

    Authors: Zhongrui Gui, Shuyang Sun, Runjia Li, Jianhao Yuan, Zhaochong An, Karsten Roth, Ameya Prabhu, Philip Torr

    Abstract: Rapid advancements in continual segmentation have yet to bridge the gap of scaling to large continually expanding vocabularies under compute-constrained scenarios. We discover that traditional continual training leads to catastrophic forgetting under compute constraints, unable to outperform zero-shot segmentation methods. We introduce a novel strategy for semantic and panoptic segmentation with z… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: 10 pages, 3 figures

  41. arXiv:2404.09322  [pdf

    cs.DC cs.AI

    The intelligent prediction and assessment of financial information risk in the cloud computing model

    Authors: Yufu Wang, Mingwei Zhu, Jiaqiang Yuan, Guanghui Wang, Hong Zhou

    Abstract: Cloud computing (cloud computing) is a kind of distributed computing, referring to the network "cloud" will be a huge data calculation and processing program into countless small programs, and then, through the system composed of multiple servers to process and analyze these small programs to get the results and return to the user. This report explores the intersection of cloud computing and finan… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

  42. arXiv:2404.07855  [pdf, other

    cs.CV

    Resolve Domain Conflicts for Generalizable Remote Physiological Measurement

    Authors: Weiyu Sun, Xinyu Zhang, Hao Lu, Ying Chen, Yun Ge, Xiaolin Huang, Jie Yuan, Yingcong Chen

    Abstract: Remote photoplethysmography (rPPG) technology has become increasingly popular due to its non-invasive monitoring of various physiological indicators, making it widely applicable in multimedia interaction, healthcare, and emotion analysis. Existing rPPG methods utilize multiple datasets for training to enhance the generalizability of models. However, they often overlook the underlying conflict issu… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: Accepted by ACM MM 2023

  43. arXiv:2404.04066  [pdf, other

    cs.RO cs.CL cs.HC

    VoicePilot: Harnessing LLMs as Speech Interfaces for Physically Assistive Robots

    Authors: Akhil Padmanabha, Jessie Yuan, Janavi Gupta, Zulekha Karachiwalla, Carmel Majidi, Henny Admoni, Zackory Erickson

    Abstract: Physically assistive robots present an opportunity to significantly increase the well-being and independence of individuals with motor impairments or other forms of disability who are unable to complete activities of daily living. Speech interfaces, especially ones that utilize Large Language Models (LLMs), can enable individuals to effectively and naturally communicate high-level commands and nua… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  44. arXiv:2404.02249  [pdf, other

    cs.IR cs.AI cs.LG cs.SI

    RAT: Retrieval-Augmented Transformer for Click-Through Rate Prediction

    Authors: Yushen Li, **peng Wang, Tao Dai, Jieming Zhu, Jun Yuan, Rui Zhang, Shu-Tao Xia

    Abstract: Predicting click-through rates (CTR) is a fundamental task for Web applications, where a key issue is to devise effective models for feature interactions. Current methodologies predominantly concentrate on modeling feature interactions within an individual sample, while overlooking the potential cross-sample relationships that can serve as a reference context to enhance the prediction. To make up… ▽ More

    Submitted 4 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted to The ACM Web Conference 2024 (WWW'24, short paper). Data and code are available

  45. arXiv:2404.01988  [pdf, other

    cs.CV

    Cooperative Students: Navigating Unsupervised Domain Adaptation in Nighttime Object Detection

    Authors: Jicheng Yuan, Anh Le-Tuan, Manfred Hauswirth, Danh Le-Phuoc

    Abstract: Unsupervised Domain Adaptation (UDA) has shown significant advancements in object detection under well-lit conditions; however, its performance degrades notably in low-visibility scenarios, especially at night, posing challenges not only for its adaptability in low signal-to-noise ratio (SNR) conditions but also for the reliability and efficiency of automated vehicles. To address this problem, we… ▽ More

    Submitted 8 May, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: Code is available at https://github.com/jichengyuan/Cooperitive_Students

  46. arXiv:2404.01945  [pdf, other

    cs.CV

    Event-assisted Low-Light Video Object Segmentation

    Authors: Hebei Li, ** Wang, Jiahui Yuan, Yue Li, Wenming Weng, Yansong Peng, Yueyi Zhang, Zhiwei Xiong, Xiaoyan Sun

    Abstract: In the realm of video object segmentation (VOS), the challenge of operating under low-light conditions persists, resulting in notably degraded image quality and compromised accuracy when comparing query and memory frames for similarity computation. Event cameras, characterized by their high dynamic range and ability to capture motion information of objects, offer promise in enhancing object visibi… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  47. arXiv:2404.01582  [pdf, other

    cs.CL cs.AI cs.IR

    BERT-Enhanced Retrieval Tool for Homework Plagiarism Detection System

    Authors: Jiarong Xian, Jibao Yuan, Peiwei Zheng, Dexian Chen

    Abstract: Text plagiarism detection task is a common natural language processing task that aims to detect whether a given text contains plagiarism or copying from other texts. In existing research, detection of high level plagiarism is still a challenge due to the lack of high quality datasets. In this paper, we propose a plagiarized text data generation method based on GPT-3.5, which produces 32,927 pairs… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  48. arXiv:2403.16954  [pdf, other

    cs.CV

    Isolated Diffusion: Optimizing Multi-Concept Text-to-Image Generation Training-Freely with Isolated Diffusion Guidance

    Authors: **gyuan Zhu, Huimin Ma, Jiansheng Chen, Jian Yuan

    Abstract: Large-scale text-to-image diffusion models have achieved great success in synthesizing high-quality and diverse images given target text prompts. Despite the revolutionary image generation ability, current state-of-the-art models still struggle to deal with multi-concept generation accurately in many cases. This phenomenon is known as ``concept bleeding" and displays as the unexpected overlap**… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  49. arXiv:2403.16854  [pdf, other

    cs.CL cs.AI

    An Expert is Worth One Token: Synergizing Multiple Expert LLMs as Generalist via Expert Token Routing

    Authors: Ziwei Chai, Guoyin Wang, **g Su, Tianjie Zhang, Xuanwen Huang, Xuwu Wang, **g**g Xu, Jianbo Yuan, Hongxia Yang, Fei Wu, Yang Yang

    Abstract: We present Expert-Token-Routing, a unified generalist framework that facilitates seamless integration of multiple expert LLMs. Our framework represents expert LLMs as special expert tokens within the vocabulary of a meta LLM. The meta LLM can route to an expert LLM like generating new tokens. Expert-Token-Routing not only supports learning the implicit expertise of expert LLMs from existing instru… ▽ More

    Submitted 11 June, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

  50. arXiv:2403.14192  [pdf, ps, other

    cs.IT eess.SP

    Fundamentals of Delay-Doppler Communications: Practical Implementation and Extensions to OTFS

    Authors: Shuangyang Li, Peter Jung, Weijie Yuan, Zhiqiang Wei, **hong Yuan, Baoming Bai, Giuseppe Caire

    Abstract: The recently proposed orthogonal time frequency space (OTFS) modulation, which is a typical Delay-Doppler (DD) communication scheme, has attracted significant attention thanks to its appealing performance over doubly-selective channels. In this paper, we present the fundamentals of general DD communications from the viewpoint of the Zak transform. We start our study by constructing DD domain basis… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.