Skip to main content

Showing 1–50 of 518 results for author: Shen, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00191  [pdf, other

    cs.CL

    MetaKP: On-Demand Keyphrase Generation

    Authors: Di Wu, Xiaoxian Shen, Kai-Wei Chang

    Abstract: Traditional keyphrase prediction methods predict a single set of keyphrases per document, failing to cater to the diverse needs of users and downstream applications. To bridge the gap, we introduce on-demand keyphrase generation, a novel paradigm that requires keyphrases that conform to specific high-level goals or intents. For this task, we present MetaKP, a large-scale benchmark comprising four… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

  2. arXiv:2407.00028  [pdf, other

    q-bio.NC cs.LG stat.AP

    Harnessing XGBoost for Robust Biomarker Selection of Obsessive-Compulsive Disorder (OCD) from Adolescent Brain Cognitive Development (ABCD) data

    Authors: Xinyu Shen, Qimin Zhang, Huili Zheng, Weiwei Qi

    Abstract: This study evaluates the performance of various supervised machine learning models in analyzing highly correlated neural signaling data from the Adolescent Brain Cognitive Development (ABCD) Study, with a focus on predicting obsessive-compulsive disorder scales. We simulated a dataset to mimic the correlation structures commonly found in imaging data and evaluated logistic regression, elastic netw… ▽ More

    Submitted 14 May, 2024; originally announced July 2024.

  3. arXiv:2406.18134  [pdf, other

    cs.CL

    Assessing "Implicit" Retrieval Robustness of Large Language Models

    Authors: Xiaoyu Shen, Rexhina Blloshmi, Dawei Zhu, Jiahuan Pei, Wei Zhang

    Abstract: Retrieval-augmented generation has gained popularity as a framework to enhance large language models with external knowledge. However, its effectiveness hinges on the retrieval robustness of the model. If the model lacks retrieval robustness, its performance is constrained by the accuracy of the retriever, resulting in significant compromises when the retrieved context is irrelevant. In this paper… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  4. arXiv:2406.16690  [pdf, other

    cs.CL

    Scaling Laws for Linear Complexity Language Models

    Authors: Xuyang Shen, Dong Li, Ruitao Leng, Zhen Qin, Weigao Sun, Yiran Zhong

    Abstract: The interest in linear complexity models for large language models is on the rise, although their scaling capacity remains uncertain. In this study, we present the scaling laws for linear complexity language models to establish a foundation for their scalability. Specifically, we examine the scaling behaviors of three efficient linear architectures. These include TNL, a linear attention model with… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Technical report. Yiran Zhong is the corresponding author

  5. arXiv:2406.15523  [pdf, other

    cs.LG stat.ML

    Unifying Unsupervised Graph-Level Anomaly Detection and Out-of-Distribution Detection: A Benchmark

    Authors: Yili Wang, Yixin Liu, Xu Shen, Chenyu Li, Kaize Ding, Rui Miao, Ying Wang, Shirui Pan, Xin Wang

    Abstract: To build safe and reliable graph machine learning systems, unsupervised graph-level anomaly detection (GLAD) and unsupervised graph-level out-of-distribution (OOD) detection (GLOD) have received significant attention in recent years. Though those two lines of research indeed share the same objective, they have been studied independently in the community due to distinct evaluation setups, creating… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  6. arXiv:2406.14887  [pdf, other

    cs.CL

    InternLM-Law: An Open Source Chinese Legal Large Language Model

    Authors: Zhiwei Fei, Songyang Zhang, Xiaoyu Shen, Dawei Zhu, Xiao Wang, Maosong Cao, Fengzhe Zhou, Yining Li, Wenwei Zhang, Dahua Lin, Kai Chen, Jidong Ge

    Abstract: While large language models (LLMs) have showcased impressive capabilities, they struggle with addressing legal queries due to the intricate complexities and specialized expertise required in the legal field. In this paper, we introduce InternLM-Law, a specialized LLM tailored for addressing diverse legal queries related to Chinese laws, spanning from responding to standard legal questions (e.g., l… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Our dataset, code and models will be released at https://github.com/InternLM/InternLM-Law

  7. arXiv:2406.13964  [pdf, other

    cs.NI

    Hierarchical Micro-Segmentations for Zero-Trust Services via Large Language Model (LLM)-enhanced Graph Diffusion

    Authors: Yinqiu Liu, Guangyuan Liu, Hongyang Du, Dusit Niyato, Jiawen Kang, Zehui Xiong, Dong In Kim, Xuemin Shen

    Abstract: In the rapidly evolving Next-Generation Networking (NGN) era, the adoption of zero-trust architectures has become increasingly crucial to protect security. However, provisioning zero-trust services in NGNs poses significant challenges, primarily due to the environmental complexity and dynamics. Motivated by these challenges, this paper explores efficient zero-trust service provisioning using hiera… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 13 pages

  8. arXiv:2406.13113  [pdf, other

    cs.CV cs.AI q-bio.NC

    CU-Net: a U-Net architecture for efficient brain-tumor segmentation on BraTS 2019 dataset

    Authors: Qimin Zhang, Weiwei Qi, Huili Zheng, Xinyu Shen

    Abstract: Accurately segmenting brain tumors from MRI scans is important for develo** effective treatment plans and improving patient outcomes. This study introduces a new implementation of the Columbia-University-Net (CU-Net) architecture for brain tumor segmentation using the BraTS 2019 dataset. The CU-Net model has a symmetrical U-shaped structure and uses convolutional layers, max pooling, and upsampl… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  9. arXiv:2406.11410  [pdf, other

    cs.CL cs.AI

    HARE: HumAn pRiors, a key to small language model Efficiency

    Authors: Lingyun Zhang, Bin **, Gaojian Ge, Lunhui Liu, Xuewen Shen, Mingyong Wu, Houqian Zhang, Yongneng Jiang, Shiqi Chen, Shi Pu

    Abstract: Human priors play a crucial role in efficiently utilizing data in deep learning. However, with the development of large language models (LLMs), there is an increasing emphasis on scaling both model size and data volume, which often diminishes the importance of human priors in data construction. Influenced by these trends, existing Small Language Models (SLMs) mainly rely on web-scraped large-scale… ▽ More

    Submitted 18 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  10. arXiv:2406.10744   

    cs.CV

    Technique Report of CVPR 2024 PBDL Challenges

    Authors: Ying Fu, Yu Li, Shaodi You, Boxin Shi, Jose Alvarez, Coert van Gemeren, Linwei Chen, Yunhao Zou, Zichun Wang, Yichen Li, Yuze Han, Yingkai Zhang, Jianan Wang, Qinglin Liu, Wei Yu, Xiaoqian Lv, Jianing Li, Sheng** Zhang, Xiangyang Ji, Yuanpei Chen, Yuhan Zhang, Weihang Peng, Liwen Zhang, Zhe Xu, Dingyong Gou , et al. (77 additional authors not shown)

    Abstract: The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, a… ▽ More

    Submitted 27 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: The author list and contents need to be verified by all authors

  11. arXiv:2406.09416  [pdf, other

    cs.CV

    Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models

    Authors: Qihao Liu, Zhanpeng Zeng, Ju He, Qihang Yu, Xiaohui Shen, Liang-Chieh Chen

    Abstract: This paper presents innovative enhancements to diffusion models by integrating a novel multi-resolution network and time-dependent layer normalization. Diffusion models have gained prominence for their effectiveness in high-fidelity image generation. While conventional approaches rely on convolutional U-Net architectures, recent Transformer-based designs have demonstrated superior performance and… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Introducing DiMR, a new diffusion backbone that surpasses all existing image generation models of various sizes on ImageNet 256 with only 505M parameters. Project page: https://qihao067.github.io/projects/DiMR

  12. arXiv:2406.07857  [pdf, other

    eess.SY cs.LG cs.NI

    Toward Enhanced Reinforcement Learning-Based Resource Management via Digital Twin: Opportunities, Applications, and Challenges

    Authors: Nan Cheng, Xiucheng Wang, Zan Li, Zhisheng Yin, Tom Luan, Xuemin Shen

    Abstract: This article presents a digital twin (DT)-enhanced reinforcement learning (RL) framework aimed at optimizing performance and reliability in network resource management, since the traditional RL methods face several unified challenges when applied to physical networks, including limited exploration efficiency, slow convergence, poor long-term performance, and safety concerns during the exploration… ▽ More

    Submitted 15 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: 7pages, 6figures

  13. arXiv:2406.07550  [pdf, other

    cs.CV

    An Image is Worth 32 Tokens for Reconstruction and Generation

    Authors: Qihang Yu, Mark Weber, Xueqing Deng, Xiaohui Shen, Daniel Cremers, Liang-Chieh Chen

    Abstract: Recent advancements in generative models have highlighted the crucial role of image tokenization in the efficient synthesis of high-resolution images. Tokenization, which transforms images into latent representations, reduces computational demands compared to directly processing pixels and enhances the effectiveness and efficiency of the generation process. Prior methods, such as VQGAN, typically… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: A compact 1D Image Tokenization method, leading to SOTA generation performance while being substantially faster. Project page at https://yucornetto.github.io/projects/titok.html

  14. arXiv:2406.06211  [pdf, other

    cs.CV

    iMotion-LLM: Motion Prediction Instruction Tuning

    Authors: Abdulwahab Felemban, Eslam Mohamed Bakr, Xiaoqian Shen, Jian Ding, Abduallah Mohamed, Mohamed Elhoseiny

    Abstract: We introduce iMotion-LLM: a Multimodal Large Language Models (LLMs) with trajectory prediction, tailored to guide interactive multi-agent scenarios. Different from conventional motion prediction approaches, iMotion-LLM capitalizes on textual instructions as key inputs for generating contextually relevant trajectories. By enriching the real-world driving scenarios in the Waymo Open Dataset with tex… ▽ More

    Submitted 11 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

  15. arXiv:2406.05516  [pdf, other

    cs.LG cs.AI cs.CL

    Verbalized Probabilistic Graphical Modeling with Large Language Models

    Authors: Hengguan Huang, Xing Shen, Songtao Wang, Dianbo Liu, Hao Wang

    Abstract: Faced with complex problems, the human brain demonstrates a remarkable capacity to transcend sensory input and form latent understandings of perceived world patterns. However, this cognitive capacity is not explicitly considered or encoded in current large language models (LLMs). As a result, LLMs often struggle to capture latent structures and model uncertainty in complex compositional reasoning… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  16. arXiv:2406.04627  [pdf, other

    cs.LG cs.AI

    Denoising-Aware Contrastive Learning for Noisy Time Series

    Authors: Shuang Zhou, Daochen Zha, Xiao Shen, Xiao Huang, Rui Zhang, Fu-Lai Chung

    Abstract: Time series self-supervised learning (SSL) aims to exploit unlabeled data for pre-training to mitigate the reliance on labels. Despite the great success in recent years, there is limited discussion on the potential noise in the time series, which can severely impair the performance of existing SSL methods. To mitigate the noise, the de facto strategy is to apply conventional denoising methods befo… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Accepted to 33rd International Joint Conference on Artificial Intelligence (IJCAI 2024)

  17. arXiv:2406.02541  [pdf, other

    cs.CV

    Enhancing Temporal Consistency in Video Editing by Reconstructing Videos with 3D Gaussian Splatting

    Authors: Inkyu Shin, Qihang Yu, Xiaohui Shen, In So Kweon, Kuk-** Yoon, Liang-Chieh Chen

    Abstract: Recent advancements in zero-shot video diffusion models have shown promise for text-driven video editing, but challenges remain in achieving high temporal consistency. To address this, we introduce Video-3DGS, a 3D Gaussian Splatting (3DGS)-based video refiner designed to enhance temporal consistency in zero-shot video editors. Our approach utilizes a two-stage 3D Gaussian optimizing process tailo… ▽ More

    Submitted 5 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: Project page at https://video-3dgs-project.github.io/

  18. arXiv:2405.21022  [pdf, other

    cs.CL cs.CV

    You Only Scan Once: Efficient Multi-dimension Sequential Modeling with LightNet

    Authors: Zhen Qin, Yuxin Mao, Xuyang Shen, Dong Li, **g Zhang, Yuchao Dai, Yiran Zhong

    Abstract: Linear attention mechanisms have gained prominence in causal language models due to their linear computational complexity and enhanced speed. However, the inherent decay mechanism in linear attention presents challenges when applied to multi-dimensional sequence modeling tasks, such as image processing and multi-modal learning. In these scenarios, the utilization of sequential scanning to establis… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: Technical report. Yiran Zhong is the corresponding author. The code is available at https://github.com/OpenNLPLab/LightNet

  19. arXiv:2405.19103  [pdf, other

    cs.CR cs.LG

    Voice Jailbreak Attacks Against GPT-4o

    Authors: Xinyue Shen, Yixin Wu, Michael Backes, Yang Zhang

    Abstract: Recently, the concept of artificial assistants has evolved from science fiction into real-world applications. GPT-4o, the newest multimodal large language model (MLLM) across audio, vision, and text, has further blurred the line between fiction and reality by enabling more natural human-computer interactions. However, the advent of GPT-4o's voice mode may also introduce a new attack surface. In th… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  20. arXiv:2405.18167  [pdf, other

    eess.IV cs.CV

    Confidence-aware multi-modality learning for eye disease screening

    Authors: Ke Zou, Tian Lin, Zongbo Han, Meng Wang, Xuedong Yuan, Haoyu Chen, Changqing Zhang, Xiao**g Shen, Huazhu Fu

    Abstract: Multi-modal ophthalmic image classification plays a key role in diagnosing eye diseases, as it integrates information from different sources to complement their respective performances. However, recent improvements have mainly focused on accuracy, often neglecting the importance of confidence and robustness in predictions for diverse modalities. In this study, we propose a novel multi-modality evi… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 27 pages, 7 figures, 9 tables

  21. Adaptive Device-Edge Collaboration on DNN Inference in AIoT: A Digital Twin-Assisted Approach

    Authors: Shisheng Hu, Mushu Li, Jie Gao, Conghao Zhou, Xuemin Shen

    Abstract: Device-edge collaboration on deep neural network (DNN) inference is a promising approach to efficiently utilizing network resources for supporting artificial intelligence of things (AIoT) applications. In this paper, we propose a novel digital twin (DT)-assisted approach to device-edge collaboration on DNN inference that determines whether and when to stop local inference at a device and upload th… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Journal ref: IEEE Internet Things J. (Volume: 11, Issue: 7, 01 April 2024)

  22. arXiv:2405.17383  [pdf, other

    cs.CL

    Unlocking the Secrets of Linear Complexity Sequence Model from A Unified Perspective

    Authors: Zhen Qin, Xuyang Shen, Dong Li, Weigao Sun, Stan Birchfield, Richard Hartley, Yiran Zhong

    Abstract: We present the Linear Complexity Sequence Model (LCSM), a comprehensive solution that unites various sequence modeling techniques with linear complexity, including linear attention, state space model, long convolution, and linear RNN, within a single framework. The goal is to enhance comprehension of these models by analyzing the impact of each component from a cohesive and streamlined viewpoint.… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Technical report. Yiran Zhong is the corresponding author

  23. arXiv:2405.17381  [pdf, other

    cs.CL

    Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention

    Authors: Zhen Qin, Weigao Sun, Dong Li, Xuyang Shen, Weixuan Sun, Yiran Zhong

    Abstract: We present Lightning Attention, the first linear attention implementation that maintains a constant training speed for various sequence lengths under fixed memory consumption. Due to the issue with cumulative summation operations (cumsum), previous linear attention implementations cannot achieve their theoretical advantage in a casual setting. However, this issue can be effectively solved by utili… ▽ More

    Submitted 20 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: Accepted by ICML 2024. Yiran Zhong is the corresponding author. Code is released at github.com/OpenNLPLab/TransnormerLLM

  24. arXiv:2405.16837  [pdf, ps, other

    stat.ML cs.LG

    Enhancing Accuracy in Generative Models via Knowledge Transfer

    Authors: Xinyu Tian, Xiaotong Shen

    Abstract: This paper investigates the accuracy of generative models and the impact of knowledge transfer on their generation precision. Specifically, we examine a generative model for a target task, fine-tuned using a pre-trained model from a source task. Building on the "Shared Embedding" concept, which bridges the source and target tasks, we introduce a novel framework for transfer learning under distribu… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  25. arXiv:2405.14371  [pdf, other

    cs.DC

    EdgeShard: Efficient LLM Inference via Collaborative Edge Computing

    Authors: Ming** Zhang, Jiannong Cao, Xiaoming Shen, Zeyang Cui

    Abstract: Large language models (LLMs) have shown great potential in natural language processing and content generation. However, current LLMs heavily rely on cloud computing, leading to prolonged latency, high bandwidth cost, and privacy concerns. Edge computing is promising to address such concerns by deploying LLMs on edge devices, closer to data sources. Some works try to leverage model quantization to… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: Under review

  26. arXiv:2405.13857  [pdf, other

    cs.CR cs.CY cs.HC

    What Do Privacy Advertisements Communicate to Consumers?

    Authors: Xiaoxin Shen, Eman Alashwali, Lorrie Faith Cranor

    Abstract: When companies release marketing materials aimed at promoting their privacy practices or highlighting specific privacy features, what do they actually communicate to consumers? In this paper, we explore the impact of privacy marketing on: (1) consumers' attitudes toward the organizations providing the campaigns, (2) overall privacy awareness, and (3) the actionability of suggested privacy advice.… ▽ More

    Submitted 24 June, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

    Comments: This document is the author's manuscript for a paper to appear in Proceedings on Privacy Enhancing Technologies 2024(4)

  27. arXiv:2405.12496  [pdf, other

    eess.AS cs.NI cs.SD eess.SP

    A Survey of Integrating Wireless Technology into Active Noise Control

    Authors: Xiaoyi Shen, Dongyuan Shi, Zhengding Luo, Junwei Ji, Woon-Seng Gan

    Abstract: Active Noise Control (ANC) is a widely adopted technology for reducing environmental noise across various scenarios. This paper focuses on enhancing noise reduction performance, particularly through the refinement of signal quality fed into ANC systems. We discuss the main wireless technique integrated into the ANC system, equipped with some innovative algorithms, in diverse environments. Instead… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  28. arXiv:2405.08001  [pdf, other

    math.OC cs.GR

    Preconditioned Nonlinear Conjugate Gradient Method for Real-time Interior-point Hyperelasticity

    Authors: Xing Shen, Runyuan Cai, Mengxiao Bi, Tangjie Lv

    Abstract: The linear conjugate gradient method is widely used in physical simulation, particularly for solving large-scale linear systems derived from Newton's method. The nonlinear conjugate gradient method generalizes the conjugate gradient method to nonlinear optimization, which is extensively utilized in solving practical large-scale unconstrained optimization problems. However, it is rarely discussed i… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  29. arXiv:2405.04434  [pdf, other

    cs.CL cs.AI

    DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

    Authors: DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Hao Yang, Haowei Zhang, Honghui Ding , et al. (132 additional authors not shown)

    Abstract: We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference… ▽ More

    Submitted 19 June, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

  30. arXiv:2405.03946  [pdf

    cs.SI

    Association between centrality and flourishing trait: analyzing student co-occurrence networks drawn from dining activities

    Authors: Yi Cao, Shimin Cai, Xiaorong Shen, Tao Zhou

    Abstract: Comprehending the association between social capabilities and individual psychological traits is paramount for educational administrators. Presently, many studies heavily depend on online questionnaires and self-reported data, while analysis of the connection between offline social networks and mental health status remains scarce. By leveraging a public dataset encompassing on-campus dining activi… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 14 pages, 2 figures, 1 Table

  31. arXiv:2405.03486  [pdf, other

    cs.CR cs.CV cs.SI

    UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images

    Authors: Yiting Qu, Xinyue Shen, Yixin Wu, Michael Backes, Savvas Zannettou, Yang Zhang

    Abstract: Image safety classifiers play an important role in identifying and mitigating the spread of unsafe images online (e.g., images including violence, hateful rhetoric, etc.). At the same time, with the advent of text-to-image models and increasing concerns about the safety of AI models, developers are increasingly relying on image safety classifiers to safeguard their models. Yet, the performance of… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  32. arXiv:2405.01221  [pdf, other

    cs.NI

    A Survey on Semantic Communication Networks: Architecture, Security, and Privacy

    Authors: Shaolong Guo, Yuntao Wang, Ning Zhang, Zhou Su, Tom H. Luan, Zhiyi Tian, Xuemin Shen

    Abstract: Semantic communication, emerging as a breakthrough beyond the classical Shannon paradigm, aims to convey the essential meaning of source data rather than merely focusing on precise yet content-agnostic bit transmission. By interconnecting diverse intelligent agents (e.g., autonomous vehicles and VR devices) via semantic communications, the semantic communication networks (SemComNet) supports seman… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  33. arXiv:2404.16812  [pdf, other

    cs.DC

    ESG: Pipeline-Conscious Efficient Scheduling of DNN Workflows on Serverless Platforms with Shareable GPUs

    Authors: Xinning Hui, Yuanchao Xu, Zhishan Guo, Xipeng Shen

    Abstract: Recent years have witnessed increasing interest in machine learning inferences on serverless computing for its auto-scaling and cost effective properties. Existing serverless computing, however, lacks effective job scheduling methods to handle the schedule space dramatically expanded by GPU sharing, task batching, and inter-task relations. Prior solutions have dodged the issue by neglecting some i… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: To appear in the 33rd International Symposium on High-Performance Parallel and Distributed Computing (HPDC'24)

  34. arXiv:2404.15625  [pdf, other

    cs.LG

    Optimizing OOD Detection in Molecular Graphs: A Novel Approach with Diffusion Models

    Authors: Xu Shen, Yili Wang, Kaixiong Zhou, Shirui Pan, Xin Wang

    Abstract: The open-world test dataset is often mixed with out-of-distribution (OOD) samples, where the deployed models will struggle to make accurate predictions. Traditional detection methods need to trade off OOD detection and in-distribution (ID) classification performance since they share the same representation learning model. In this work, we propose to detect OOD molecules by adopting an auxiliary di… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 11 pages,10 figures

  35. arXiv:2404.14720  [pdf, other

    cs.CR

    Incorporating Gradients to Rules: Towards Lightweight, Adaptive Provenance-based Intrusion Detection

    Authors: Lingzhi Wang, Xiangmin Shen, Weijian Li, Zhenyuan Li, R. Sekar, Han Liu, Yan Chen

    Abstract: As cyber-attacks become increasingly sophisticated and stealthy, it becomes more imperative and challenging to detect intrusion from normal behaviors. Through fine-grained causality analysis, provenance-based intrusion detection systems (PIDS) demonstrated a promising capacity to distinguish benign and malicious behaviors, attracting widespread attention from both industry and academia. Among dive… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  36. arXiv:2404.14381  [pdf, other

    cs.CV cs.MM

    TAVGBench: Benchmarking Text to Audible-Video Generation

    Authors: Yuxin Mao, Xuyang Shen, **g Zhang, Zhen Qin, **xing Zhou, Mochu Xiang, Yiran Zhong, Yuchao Dai

    Abstract: The Text to Audible-Video Generation (TAVG) task involves generating videos with accompanying audio based on text descriptions. Achieving this requires skillful alignment of both audio and video elements. To support research in this field, we have developed a comprehensive Text to Audible-Video Generation Benchmark (TAVGBench), which contains over 1.7 million clips with a total duration of 11.8 th… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: Technical Report. Project page:https://github.com/OpenNLPLab/TAVGBench

  37. arXiv:2404.14122  [pdf, other

    cs.CL

    Fine-Tuning Large Language Models to Translate: Will a Touch of Noisy Data in Misaligned Languages Suffice?

    Authors: Dawei Zhu, Pinzhen Chen, Miaoran Zhang, Barry Haddow, Xiaoyu Shen, Dietrich Klakow

    Abstract: Traditionally, success in multilingual machine translation can be attributed to three key factors in training data: large volume, diverse translation directions, and high quality. In the current practice of fine-tuning large language models (LLMs) for translation, we revisit the importance of all these factors. We find that LLMs display strong translation capability after being fine-tuned on as fe… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  38. arXiv:2404.13898  [pdf, other

    cs.NI

    Cross-Modal Generative Semantic Communications for Mobile AIGC: Joint Semantic Encoding and Prompt Engineering

    Authors: Yinqiu Liu, Hongyang Du, Dusit Niyato, Jiawen Kang, Zehui Xiong, Shiwen Mao, ** Zhang, Xuemin Shen

    Abstract: Employing massive Mobile AI-Generated Content (AIGC) Service Providers (MASPs) with powerful models, high-quality AIGC services can become accessible for resource-constrained end users. However, this advancement, referred to as mobile AIGC, also introduces a significant challenge: users should download large AIGC outputs from the MASPs, leading to substantial bandwidth consumption and potential tr… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  39. arXiv:2404.13749  [pdf, other

    cs.NI

    Efficient Digital Twin Data Processing for Low-Latency Multicast Short Video Streaming

    Authors: Xinyu Huang, Shisheng Hu, Mushu Li, Cheng Huang, Xuemin Shen

    Abstract: In this paper, we propose a novel efficient digital twin (DT) data processing scheme to reduce service latency for multicast short video streaming. Particularly, DT is constructed to emulate and analyze user status for multicast group update and swipe feature abstraction. Then, a precise measurement model of DT data processing is developed to characterize the relationship among DT model size, user… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: 6 pages, 6 figures, submitted to ICCC 2024

  40. arXiv:2404.13649  [pdf, other

    stat.ML cs.LG stat.ME

    Distributional Principal Autoencoders

    Authors: Xinwei Shen, Nicolai Meinshausen

    Abstract: Dimension reduction techniques usually lose information in the sense that reconstructed data are not identical to the original data. However, we argue that it is possible to have reconstructed data identically distributed as the original data, irrespective of the retained dimension or the specific map**. This can be achieved by learning a distributional model that matches the conditional distrib… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  41. arXiv:2404.13528  [pdf, other

    cs.LG cs.AI cs.DC

    SmartMem: Layout Transformation Elimination and Adaptation for Efficient DNN Execution on Mobile

    Authors: Wei Niu, Md Musfiqur Rahman Sanim, Zhihao Shu, Jiexiong Guan, Xipeng Shen, Miao Yin, Gagan Agrawal, Bin Ren

    Abstract: This work is motivated by recent developments in Deep Neural Networks, particularly the Transformer architectures underlying applications such as ChatGPT, and the need for performing inference on mobile devices. Focusing on emerging transformers (specifically the ones with computationally efficient Swin-like architectures) and large models (e.g., Stable Diffusion and LLMs) based on transformers, w… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  42. arXiv:2404.12567  [pdf

    cs.HC

    Impact of Vibrotactile Triggers on Mental Well-Being through ASMR Experience in VR

    Authors: Danyang Peng, Tanner Person, Ximing Shen, Yun Suen Pai, Giulia Barbareschi, Shengyin Li, Kouta Minamizawa

    Abstract: Watching Autonomous Sensory Meridian Response (ASMR) videos is a popular approach to support mental well-being, as the triggered ASMR tingling sensation supports de-stressing and regulating emotions. Therefore, there is increasing research on how to efficiently trigger ASMR tingling sensation. Tactile sensation remains unexplored because current popular ASMR approaches focus on the visual and audi… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  43. arXiv:2404.12241  [pdf, other

    cs.CL cs.AI

    Introducing v0.5 of the AI Safety Benchmark from MLCommons

    Authors: Bertie Vidgen, Adarsh Agrawal, Ahmed M. Ahmed, Victor Akinwande, Namir Al-Nuaimi, Najla Alfaraj, Elie Alhajjar, Lora Aroyo, Trupti Bavalatti, Max Bartolo, Borhane Blili-Hamelin, Kurt Bollacker, Rishi Bomassani, Marisa Ferrara Boston, Siméon Campos, Kal Chakra, Canyu Chen, Cody Coleman, Zacharie Delpierre Coudert, Leon Derczynski, Debojyoti Dutta, Ian Eisenberg, James Ezick, Heather Frase, Brian Fuller , et al. (75 additional authors not shown)

    Abstract: This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-pu… ▽ More

    Submitted 13 May, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

  44. arXiv:2404.11288  [pdf, other

    cs.CL

    A Preference-driven Paradigm for Enhanced Translation with Large Language Models

    Authors: Dawei Zhu, Sony Trenous, Xiaoyu Shen, Dietrich Klakow, Bill Byrne, Eva Hasler

    Abstract: Recent research has shown that large language models (LLMs) can achieve remarkable translation performance through supervised fine-tuning (SFT) using only a small amount of parallel data. However, SFT simply instructs the model to imitate the reference translations at the token level, making it vulnerable to the noise present in the references. Hence, the assistance from SFT often reaches a platea… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted to NAACL 2024 (long, main)

  45. arXiv:2404.11276  [pdf, other

    cs.AI q-fin.GN

    RD2Bench: Toward Data-Centric Automatic R&D

    Authors: Haotian Chen, Xinjie Shen, Zeqi Ye, Xiao Yang, Xu Yang, Weiqing Liu, Jiang Bian

    Abstract: The progress of humanity is driven by those successful discoveries accompanied by countless failed experiments. Researchers often seek the potential research directions by reading and then verifying them through experiments. The process imposes a significant burden on researchers. In the past decade, the data-driven black-box deep learning method demonstrates its effectiveness in a wide range of r… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: 17 pages, 5 figures,

  46. arXiv:2404.08677  [pdf, other

    cs.IR cs.AI cs.CL

    PMG : Personalized Multimodal Generation with Large Language Models

    Authors: Xiaoteng Shen, Rui Zhang, Xiaoyan Zhao, Jieming Zhu, Xi Xiao

    Abstract: The emergence of large language models (LLMs) has revolutionized the capabilities of text comprehension and generation. Multi-modal generation attracts great attention from both the industry and academia, but there is little work on personalized generation, which has important applications such as recommender systems. This paper proposes the first method for personalized multimodal generation usin… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

  47. arXiv:2404.08639  [pdf, other

    cs.CV

    COCONut: Modernizing COCO Segmentation

    Authors: Xueqing Deng, Qihang Yu, Peng Wang, Xiaohui Shen, Liang-Chieh Chen

    Abstract: In recent decades, the vision community has witnessed remarkable progress in visual recognition, partially owing to advancements in dataset benchmarks. Notably, the established COCO benchmark has propelled the development of modern detection and segmentation systems. However, the COCO segmentation benchmark has seen comparatively slow improvement over the last decade. Originally equipped with coar… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: Accepted at CVPR2024, data available at https://xdeng7.github.io/coconut.github.io/

  48. arXiv:2404.07904  [pdf, other

    cs.CL

    HGRN2: Gated Linear RNNs with State Expansion

    Authors: Zhen Qin, Songlin Yang, Weixuan Sun, Xuyang Shen, Dong Li, Weigao Sun, Yiran Zhong

    Abstract: Hierarchically gated linear RNN (HGRN,Qin et al. 2023) has demonstrated competitive training speed and performance in language modeling, while offering efficient inference. However, the recurrent state size of HGRN remains relatively small, which limits its expressiveness.To address this issue, inspired by linear attention, we introduce a simple outer-product-based state expansion mechanism so tha… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: Techinical Report. Yiran Zhong is the corresponding author. The source code is available at https://github.com/OpenNLPLab/HGRN2

  49. arXiv:2404.06798  [pdf, other

    cs.CV

    MedRG: Medical Report Grounding with Multi-modal Large Language Model

    Authors: Ke Zou, Yang Bai, Zhihao Chen, Yang Zhou, Yidi Chen, Kai Ren, Meng Wang, Xuedong Yuan, Xiao**g Shen, Huazhu Fu

    Abstract: Medical Report Grounding is pivotal in identifying the most relevant regions in medical images based on a given phrase query, a critical aspect in medical image analysis and radiological diagnosis. However, prevailing visual grounding approaches necessitate the manual extraction of key phrases from medical reports, imposing substantial burdens on both system efficiency and physicians. In this pape… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: 12 pages, 4 figures

  50. arXiv:2404.06182  [pdf, other

    cs.NI

    Streamlined Transmission: A Semantic-Aware XR Deployment Framework Enhanced by Generative AI

    Authors: Wanting Yang, Zehui Xiong, Tony Q. S. Quek, Xuemin Shen

    Abstract: In the era of 6G, featuring compelling visions of digital twins and metaverses, Extended Reality (XR) has emerged as a vital conduit connecting the digital and physical realms, garnering widespread interest. Ensuring a fully immersive wireless XR experience stands as a paramount technical necessity, demanding the liberation of XR from the confines of wired connections. In this paper, we first intr… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: Under review with IEEE Network