Skip to main content

Showing 1–50 of 2,635 results for author: liu, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.03045  [pdf, other

    cs.HC cs.CL cs.LG

    JailbreakHunter: A Visual Analytics Approach for Jailbreak Prompts Discovery from Large-Scale Human-LLM Conversational Datasets

    Authors: Zhihua **, Shiyi Liu, Haotian Li, Xun Zhao, Huamin Qu

    Abstract: Large Language Models (LLMs) have gained significant attention but also raised concerns due to the risk of misuse. Jailbreak prompts, a popular type of adversarial attack towards LLMs, have appeared and constantly evolved to breach the safety protocols of LLMs. To address this issue, LLMs are regularly updated with safety patches based on reported jailbreak prompts. However, malicious users often… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 18 pages, 9 figures

  2. arXiv:2407.02906  [pdf, other

    cs.CV

    Single Image Rolling Shutter Removal with Diffusion Models

    Authors: Zhanglei Yang, Haipeng Li, Mingbo Hong, Bing Zeng, Shuaicheng Liu

    Abstract: We present RS-Diffusion, the first Diffusion Models-based method for single-frame Rolling Shutter (RS) correction. RS artifacts compromise visual quality of frames due to the row wise exposure of CMOS sensors. Most previous methods have focused on multi-frame approaches, using temporal information from consecutive frames for the motion rectification. However, few approaches address the more challe… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  3. arXiv:2407.02759  [pdf

    cs.LG cs.AI

    Multi-Scenario Combination Based on Multi-Agent Reinforcement Learning to Optimize the Advertising Recommendation System

    Authors: Yang Zhao, Chang Zhou, ** Cao, Yi Zhao, Shaobo Liu, Chiyu Cheng, Xingchen Li

    Abstract: This paper explores multi-scenario optimization on large platforms using multi-agent reinforcement learning (MARL). We address this by treating scenarios like search, recommendation, and advertising as a cooperative, partially observable multi-agent decision problem. We introduce the Multi-Agent Recurrent Deterministic Policy Gradient (MARDPG) algorithm, which aligns different scenarios under a sh… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: Accepted by 2024 5th International Conference on Artificial Intelligence and Electromechanical Automation IEEE (ISBN: 979-8-3503-6617-4)

  4. arXiv:2407.02483  [pdf, other

    cs.CL cs.AI

    MMedAgent: Learning to Use Medical Tools with Multi-modal Agent

    Authors: Binxu Li, Tiankai Yan, Yuanting Pan, Zhe Xu, Jie Luo, Ruiyang Ji, Shilong Liu, Haoyu Dong, Zihao Lin, Yixin Wang

    Abstract: Multi-Modal Large Language Models (MLLMs), despite being successful, exhibit limited generality and often fall short when compared to specialized models. Recently, LLM-based agents have been developed to address these challenges by selecting appropriate specialized models as tools based on user inputs. However, such advancements have not been extensively explored within the medical domain. To brid… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  5. arXiv:2407.02482  [pdf, other

    cs.CV

    Boosting Consistency in Story Visualization with Rich-Contextual Conditional Diffusion Models

    Authors: Fei Shen, Hu Ye, Sibo Liu, Jun Zhang, Cong Wang, Xiao Han, Wei Yang

    Abstract: Recent research showcases the considerable potential of conditional diffusion models for generating consistent stories. However, current methods, which predominantly generate stories in an autoregressive and excessively caption-dependent manner, often underrate the contextual consistency and relevance of frames during sequential generation. To address this, we propose a novel Rich-contextual Condi… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  6. arXiv:2407.02390  [pdf, other

    cs.DC cs.LG

    Uncertainty-Aware Decarbonization for Datacenters

    Authors: Amy Li, Sihang Liu, Yi Ding

    Abstract: This paper represents the first effort to quantify uncertainty in carbon intensity forecasting for datacenter decarbonization. We identify and analyze two types of uncertainty -- temporal and spatial -- and discuss their system implications. To address the temporal dynamics in quantifying uncertainty for carbon intensity forecasting, we introduce a conformal prediction-based framework. Evaluation… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  7. arXiv:2407.02228  [pdf, other

    cs.CV cs.AI

    MTMamba: Enhancing Multi-Task Dense Scene Understanding by Mamba-Based Decoders

    Authors: Baijiong Lin, Weisen Jiang, Pengguang Chen, Yu Zhang, Shu Liu, Ying-Cong Chen

    Abstract: Multi-task dense scene understanding, which learns a model for multiple dense prediction tasks, has a wide range of application scenarios. Modeling long-range dependency and enhancing cross-task interactions are crucial to multi-task dense prediction. In this paper, we propose MTMamba, a novel Mamba-based architecture for multi-task scene understanding. It contains two types of core blocks: self-t… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  8. Unveiling Global Interactive Patterns across Graphs: Towards Interpretable Graph Neural Networks

    Authors: Yuwen Wang, Shunyu Liu, Tongya Zheng, Kaixuan Chen, Mingli Song

    Abstract: Graph Neural Networks (GNNs) have emerged as a prominent framework for graph mining, leading to significant advances across various domains. Stemmed from the node-wise representations of GNNs, existing explanation studies have embraced the subgraph-specific viewpoint that attributes the decision results to the salient features and local structures of nodes. However, graph-level tasks necessitate l… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: Accepted in KDD2024

  9. arXiv:2407.01511  [pdf, other

    cs.AI

    CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents

    Authors: Tianqi Xu, Linyao Chen, Dai-Jie Wu, Yanjun Chen, Zecheng Zhang, Xiang Yao, Zhiqiang Xie, Yongchao Chen, Shilong Liu, Bochen Qian, Philip Torr, Bernard Ghanem, Guohao Li

    Abstract: The development of autonomous agents increasingly relies on Multimodal Language Models (MLMs) to perform tasks described in natural language with GUI environments, such as websites, desktop computers, or mobile phones. Existing benchmarks for MLM agents in interactive environments are limited by their focus on a single environment, lack of detailed and generalized evaluation methods, and the compl… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  10. arXiv:2407.00956  [pdf, other

    cs.LG

    A Closer Look at Deep Learning on Tabular Data

    Authors: Han-Jia Ye, Si-Yang Liu, Hao-Run Cai, Qi-Le Zhou, De-Chuan Zhan

    Abstract: Tabular data is prevalent across various domains in machine learning. Although Deep Neural Network (DNN)-based methods have shown promising performance comparable to tree-based ones, in-depth evaluation of these methods is challenging due to varying performance ranks across diverse datasets. In this paper, we propose a comprehensive benchmark comprising 300 tabular datasets, covering a wide range… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  11. arXiv:2407.00928  [pdf, other

    cs.LG cs.CL

    FoldGPT: Simple and Effective Large Language Model Compression Scheme

    Authors: Songwei Liu, Chao Zeng, Lianqiang Li, Chenqian Yan, Lean Fu, Xing Mei, Fangmin Chen

    Abstract: The demand for deploying large language models(LLMs) on mobile devices continues to increase, driven by escalating data security concerns and cloud costs. However, network bandwidth and memory limitations pose challenges for deploying billion-level models on mobile devices. In this study, we investigate the outputs of different layers across various scales of LLMs and found that the outputs of mos… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  12. arXiv:2407.00462  [pdf, other

    cs.CV cs.AI

    pFLFE: Cross-silo Personalized Federated Learning via Feature Enhancement on Medical Image Segmentation

    Authors: Luyuan Xie, Manqing Lin, Siyuan Liu, ChenMing Xu, Tianyu Luan, Cong Li, Yuejian Fang, Qingni Shen, Zhonghai Wu

    Abstract: In medical image segmentation, personalized cross-silo federated learning (FL) is becoming popular for utilizing varied data across healthcare settings to overcome data scarcity and privacy concerns. However, existing methods often suffer from client drift, leading to inconsistent performance and delayed training. We propose a new framework, Personalized Federated Learning via Feature Enhancement… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  13. arXiv:2407.00356  [pdf, other

    cs.LG cs.CV

    Enhancing Accuracy and Parameter-Efficiency of Neural Representations for Network Parameterization

    Authors: Hongjun Choi, Jayaraman J. Thiagarajan, Ruben Glatt, Shusen Liu

    Abstract: In this work, we investigate the fundamental trade-off regarding accuracy and parameter efficiency in the parameterization of neural network weights using predictor networks. We present a surprising finding that, when recovering the original model accuracy is the sole objective, it can be achieved effectively through the weight reconstruction objective alone. Additionally, we explore the underlyin… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  14. arXiv:2407.00016  [pdf, other

    cs.DC

    AdaBridge: Dynamic Data and Computation Reuse for Efficient Multi-task DNN Co-evolution in Edge Systems

    Authors: Lehao Wang, Zhiwen Yu, Sicong Liu, Chenshu Wu, Xiangrui Xu, Bin Guo

    Abstract: Running multi-task DNNs on mobiles is an emerging trend for various applications like autonomous driving and mobile NLP. Mobile DNNs are often compressed to fit the limited resources and thus suffer from degraded accuracy and generalizability due to data drift. DNN evolution, e.g., continuous learning and domain adaptation, has been demonstrated effective in overcoming these issues, mostly for sin… ▽ More

    Submitted 2 May, 2024; originally announced July 2024.

    Comments: Accepted by NSDI'24 Poster

  15. Concept Lens: Visually Analyzing the Consistency of Semantic Manipulation in GANs

    Authors: Sangwon Jeong, Mingwei Li, Matthew Berger, Shusen Liu

    Abstract: As applications of generative AI become mainstream, it is important to understand what generative models are capable of producing, and the extent to which one can predictably control their outputs. In this paper, we propose a visualization design, named Concept Lens, for jointly navigating the data distribution of a generative model, and concept manipulations supported by the model. Our work is fo… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Journal ref: 2023 IEEE Visualization and Visual Analytics (VIS), Melbourne, Australia, 2023, pp. 221-225

  16. arXiv:2406.19749  [pdf, other

    eess.IV cs.CV

    SPIRONet: Spatial-Frequency Learning and Topological Channel Interaction Network for Vessel Segmentation

    Authors: De-Xing Huang, Xiao-Hu Zhou, Xiao-Liang Xie, Shi-Qi Liu, Shuang-Yi Wang, Zhen-Qiu Feng, Mei-Jiang Gui, Hao Li, Tian-Yu Xiang, Bo-Xian Yao, Zeng-Guang Hou

    Abstract: Automatic vessel segmentation is paramount for develo** next-generation interventional navigation systems. However, current approaches suffer from suboptimal segmentation performances due to significant challenges in intraoperative images (i.e., low signal-to-noise ratio, small or slender vessels, and strong interference). In this paper, a novel spatial-frequency learning and topological channel… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  17. arXiv:2406.18927  [pdf, other

    cs.CV

    RoFIR: Robust Fisheye Image Rectification Framework Impervious to Optical Center Deviation

    Authors: Zhaokang Liao, Hao Feng, Shaokai Liu, Wengang Zhou, Houqiang Li

    Abstract: Fisheye images are categorized fisheye into central and deviated based on the optical center position. Existing rectification methods are limited to central fisheye images, while this paper proposes a novel method that extends to deviated fisheye image rectification. The challenge lies in the variant global distortion distribution pattern caused by the random optical center position. To address th… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  18. arXiv:2406.18583  [pdf, other

    cs.CV cs.LG

    Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT

    Authors: Le Zhuo, Ruoyi Du, Han Xiao, Yangguang Li, Dongyang Liu, Rongjie Huang, Wenze Liu, Lirui Zhao, Fu-Yun Wang, Zhanyu Ma, Xu Luo, Zehan Wang, Kaipeng Zhang, Xiangyang Zhu, Si Liu, Xiangyu Yue, Dingning Liu, Wanli Ouyang, Ziwei Liu, Yu Qiao, Hongsheng Li, Peng Gao

    Abstract: Lumina-T2X is a nascent family of Flow-based Large Diffusion Transformers that establishes a unified framework for transforming noise into various modalities, such as images and videos, conditioned on text instructions. Despite its promising capabilities, Lumina-T2X still encounters challenges including training instability, slow inference, and extrapolation artifacts. In this paper, we present Lu… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Code at: https://github.com/Alpha-VLLM/Lumina-T2X

  19. arXiv:2406.18575  [pdf

    cs.CV cs.LG

    Research on Driver Facial Fatigue Detection Based on Yolov8 Model

    Authors: Chang Zhou, Yang Zhao, Shaobo Liu, Yi Zhao, Xingchen Li, Chiyu Cheng

    Abstract: In a society where traffic accidents frequently occur, fatigue driving has emerged as a grave issue. Fatigue driving detection technology, especially those based on the YOLOv8 deep learning model, has seen extensive research and application as an effective preventive measure. This paper discusses in depth the methods and technologies utilized in the YOLOv8 model to detect driver fatigue, elaborate… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted by the 5th International Conference on Information Science, Parallel and Distributed Systems (ISPDS 2024), 2024 IEEE

  20. arXiv:2406.18373  [pdf, other

    cs.CL cs.SD eess.AS

    Dynamic Data Pruning for Automatic Speech Recognition

    Authors: Qiao Xiao, **chuan Ma, Adriana Fernandez-Lopez, Boqian Wu, Lu Yin, Stavros Petridis, Mykola Pechenizkiy, Maja Pantic, Decebal Constantin Mocanu, Shiwei Liu

    Abstract: The recent success of Automatic Speech Recognition (ASR) is largely attributed to the ever-growing amount of training data. However, this trend has made model training prohibitively costly and imposed computational demands. While data pruning has been proposed to mitigate this issue by identifying a small subset of relevant data, its application in ASR has been barely explored, and existing works… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  21. arXiv:2406.18364  [pdf

    cs.CL cs.AI

    Research on Information Extraction of LCSTS Dataset Based on an Improved BERTSum-LSTM Model

    Authors: Yiming Chen, Haobin Chen, Simin Liu, Yunyun Liu, Fanhao Zhou, Bing Wei

    Abstract: With the continuous advancement of artificial intelligence, natural language processing technology has become widely utilized in various fields. At the same time, there are many challenges in creating Chinese news summaries. First of all, the semantics of Chinese news is complex, and the amount of information is enormous. Extracting critical information from Chinese news presents a significant cha… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: submitted to ICMIII 2024

  22. Temporal Prototype-Aware Learning for Active Voltage Control on Power Distribution Networks

    Authors: Feiyang Xu, Shunyu Liu, Yunpeng Qing, Yihe Zhou, Yuwen Wang, Mingli Song

    Abstract: Active Voltage Control (AVC) on the Power Distribution Networks (PDNs) aims to stabilize the voltage levels to ensure efficient and reliable operation of power systems. With the increasing integration of distributed energy resources, recent efforts have explored employing multi-agent reinforcement learning (MARL) techniques to realize effective AVC. Existing methods mainly focus on the acquisition… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 12 pages, 8 figures

  23. arXiv:2406.17699  [pdf, other

    math.ST cs.LG

    Can independent Metropolis beat crude Monte Carlo?

    Authors: Siran Liu, Petros Dellaportas, Michalis K. Titsias

    Abstract: Assume that we would like to estimate the expected value of a function $F$ with respect to a density $π$. We prove that if $π$ is close enough under KL divergence to another density $q$, an independent Metropolis sampler estimator that obtains samples from $π$ with proposal density $q$, enriched with a variance reduction computational strategy based on control variates, achieves smaller asymptotic… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 37 pages, 3 figures

  24. arXiv:2406.17624  [pdf, other

    cs.CL cs.AI

    Self-assessment, Exhibition, and Recognition: a Review of Personality in Large Language Models

    Authors: Zhiyuan Wen, Yu Yang, Jiannong Cao, Haoming Sun, Ruosong Yang, Shuaiqi Liu

    Abstract: As large language models (LLMs) appear to behave increasingly human-like in text-based interactions, more and more researchers become interested in investigating personality in LLMs. However, the diversity of psychological personality research and the rapid development of LLMs have led to a broad yet fragmented landscape of studies in this interdisciplinary field. Extensive studies across differen… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  25. arXiv:2406.17614  [pdf, other

    cs.CV cs.MM

    MSRS: Training Multimodal Speech Recognition Models from Scratch with Sparse Mask Optimization

    Authors: Adriana Fernandez-Lopez, Honglie Chen, **chuan Ma, Lu Yin, Qiao Xiao, Stavros Petridis, Shiwei Liu, Maja Pantic

    Abstract: Pre-trained models have been a foundational approach in speech recognition, albeit with associated additional costs. In this study, we propose a regularization technique that facilitates the training of visual and audio-visual speech recognition models (VSR and AVSR) from scratch. This approach, abbreviated as \textbf{MSRS} (Multimodal Speech Recognition from Scratch), introduces a sparse regulari… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech 2024

  26. arXiv:2406.17167  [pdf, other

    cs.LG

    Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis

    Authors: Hongkang Li, Meng Wang, Shuai Zhang, Sijia Liu, Pin-Yu Chen

    Abstract: Efficient training and inference algorithms, such as low-rank adaption and model pruning, have shown impressive performance for learning Transformer-based large foundation models. However, due to the technical challenges of the non-convex optimization caused by the complicated architecture of Transformers, the theoretical study of why these methods can be applied to learn Transformers is mostly el… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: IEEE SAM Workshop 2024

  27. arXiv:2406.16933  [pdf, other

    eess.SP cs.AI

    SGSM: A Foundation-model-like Semi-generalist Sensing Model

    Authors: Tianjian Yang, Hao Zhou, Shuo Liu, Kaiwen Guo, Yiwen Hou, Haohua Du, Zhi Liu, Xiang-Yang Li

    Abstract: The significance of intelligent sensing systems is growing in the realm of smart services. These systems extract relevant signal features and generate informative representations for particular tasks. However, building the feature extraction component for such systems requires extensive domain-specific expertise or data. The exceptionally rapid development of foundation models is likely to usher i… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  28. arXiv:2406.16655  [pdf, other

    cs.CL

    Large Language Models Are Cross-Lingual Knowledge-Free Reasoners

    Authors: Peng Hu, Sizhe Liu, Changjiang Gao, Xin Huang, Xue Han, Junlan Feng, Chao Deng, Shujian Huang

    Abstract: Large Language Models have demonstrated impressive reasoning capabilities across multiple languages. However, the relationship between capabilities in different languages is less explored. In this work, we decompose the process of reasoning tasks into two separated parts: knowledge retrieval and knowledge-free reasoning, and analyze the cross-lingual transferability of them. With adapted and const… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  29. arXiv:2406.16260  [pdf, other

    cs.CV cs.AI

    Video-Infinity: Distributed Long Video Generation

    Authors: Zhenxiong Tan, Xingyi Yang, Songhua Liu, Xinchao Wang

    Abstract: Diffusion models have recently achieved remarkable results for video generation. Despite the encouraging performances, the generated videos are typically constrained to a small number of frames, resulting in clips lasting merely a few seconds. The primary challenges in producing longer videos include the substantial memory requirements and the extended processing time required on a single GPU. A s… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  30. arXiv:2406.16253  [pdf, other

    cs.CL

    LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing

    Authors: Jiangshu Du, Yibo Wang, Wenting Zhao, Zhongfen Deng, Shuaiqi Liu, Renze Lou, Henry Peng Zou, Pranav Narayanan Venkit, Nan Zhang, Mukund Srinath, Haoran Ranran Zhang, Vipul Gupta, Yinghui Li, Tao Li, Fei Wang, Qin Liu, Tianlin Liu, Pengzhi Gao, Congying Xia, Chen Xing, Jiayang Cheng, Zhaowei Wang, Ying Su, Raj Sanjay Shah, Ruohao Guo , et al. (15 additional authors not shown)

    Abstract: This work is motivated by two key trends. On one hand, large language models (LLMs) have shown remarkable versatility in various generative tasks such as writing, drawing, and question answering, significantly reducing the time required for many routine tasks. On the other hand, researchers, whose work is not only time-consuming but also highly expertise-demanding, face increasing challenges as th… ▽ More

    Submitted 25 June, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

  31. arXiv:2406.16151  [pdf, other

    cs.AI cs.LG eess.SY

    Monte Carlo Planning for Stochastic Control on Constrained Markov Decision Processes

    Authors: Larkin Liu, Shiqi Liu, Matej Jusup

    Abstract: In the world of stochastic control, especially in economics and engineering, Markov Decision Processes (MDPs) can effectively model various stochastic decision processes, from asset management to transportation optimization. These underlying MDPs, upon closer examination, often reveal a specifically constrained causal structure concerning the transition and reward dynamics. By exploiting this stru… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: Working manuscript

    ACM Class: C.4

  32. arXiv:2406.16069  [pdf, other

    cs.CL cs.AI

    FastMem: Fast Memorization of Prompt Improves Context Awareness of Large Language Models

    Authors: Junyi Zhu, Shuochen Liu, Yu Yu, Bo Tang, Yibo Yan, Zhiyu Li, Feiyu Xiong, Tong Xu, Matthew B. Blaschko

    Abstract: Large language models (LLMs) excel in generating coherent text, but they often struggle with context awareness, leading to inaccuracies in tasks requiring faithful adherence to provided information. We introduce FastMem, a novel method designed to enhance instruction fine-tuned LLMs' context awareness through fast memorization of the prompt. FastMem maximizes the likelihood of the prompt before in… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  33. arXiv:2406.16005  [pdf, other

    cs.DC

    A Tale of Two Paths: Toward a Hybrid Data Plane for Efficient Far-Memory Applications

    Authors: Lei Chen, Shi Liu, Chenxi Wang, Haoran Ma, Yifan Qiao, Zhe Wang, Chenggang Wu, Youyou Lu, Xiaobing Feng, Huimin Cui, Shan Lu, Harry Xu

    Abstract: With rapid advances in network hardware, far memory has gained a great deal of traction due to its ability to break the memory capacity wall. Existing far memory systems fall into one of two data paths: one that uses the kernel's paging system to transparently access far memory at the page granularity, and a second that bypasses the kernel, fetching data at the object granularity. While it is gene… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  34. arXiv:2406.15634  [pdf, other

    cs.GR

    Text-based Transfer Function Design for Semantic Volume Rendering

    Authors: Sangwon Jeong, Jixian Li, Christopher Johnson, Shusen Liu, Matthew Berger

    Abstract: Transfer function design is crucial in volume rendering, as it directly influences the visual representation and interpretation of volumetric data. However, creating effective transfer functions that align with users' visual objectives is often challenging due to the complex parameter space and the semantic gap between transfer function values and features of interest within the volume. In this wo… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  35. arXiv:2406.15609  [pdf, other

    physics.med-ph cs.AI

    Automated radiotherapy treatment planning guided by GPT-4Vision

    Authors: Sheng Liu, Oscar Pastor-Serrano, Yizheng Chen, Matthew Gopaulchan, Weixing Liang, Mark Buyyounouski, Erqi Pollom, Quynh-Thu Le, Michael Gensheimer, Peng Dong, Yong Yang, James Zou, Lei Xing

    Abstract: Radiotherapy treatment planning is a time-consuming and potentially subjective process that requires the iterative adjustment of model parameters to balance multiple conflicting objectives. Recent advancements in large foundation models offer promising avenues for addressing the challenges in planning and clinical decision-making. This study introduces GPT-RadPlan, a fully automated treatment plan… ▽ More

    Submitted 1 July, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

    Comments: 12 pages, 4 figures

  36. arXiv:2406.15486  [pdf, other

    cs.CL cs.AI cs.LG

    SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention

    Authors: Qianchao Zhu, Jiangfei Duan, Chang Chen, Siran Liu, Xiuhong Li, Guanyu Feng, Xin Lv, Huanqi Cao, Xiao Chuanfu, Xingcheng Zhang, Dahua Lin, Chao Yang

    Abstract: Large language models (LLMs) now support extremely long context windows, but the quadratic complexity of vanilla attention results in significantly long Time-to-First-Token (TTFT) latency. Existing approaches to address this complexity require additional pretraining or finetuning, and often sacrifice model accuracy. In this paper, we first provide both theoretical and empirical foundations for nea… ▽ More

    Submitted 28 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  37. arXiv:2406.14593  [pdf, other

    cs.LG

    Enhancing Dropout-based Bayesian Neural Networks with Multi-Exit on FPGA

    Authors: Hao Mark Chen, Liam Castelli, Martin Ferianc, Hongyu Zhou, Shuanglong Liu, Wayne Luk, Hongxiang Fan

    Abstract: Reliable uncertainty estimation plays a crucial role in various safety-critical applications such as medical diagnosis and autonomous driving. In recent years, Bayesian neural networks (BayesNNs) have gained substantial research and industrial interests due to their capability to make accurate predictions with reliable uncertainty estimation. However, the algorithmic complexity and the resulting h… ▽ More

    Submitted 24 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2308.06849

  38. arXiv:2406.14558  [pdf, other

    cs.RO cs.AI

    CooHOI: Learning Cooperative Human-Object Interaction with Manipulated Object Dynamics

    Authors: Jiawei Gao, Ziqin Wang, Zeqi Xiao, **gbo Wang, Tai Wang, **kun Cao, Xiaolin Hu, Si Liu, Jifeng Dai, Jiangmiao Pang

    Abstract: Recent years have seen significant advancements in humanoid control, largely due to the availability of large-scale motion capture data and the application of reinforcement learning methodologies. However, many real-world tasks, such as moving large and heavy furniture, require multi-character collaboration. Given the scarcity of data on multi-character collaboration and the efficiency challenges… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  39. arXiv:2406.14556  [pdf, other

    cs.RO cs.CV

    Asynchronous Large Language Model Enhanced Planner for Autonomous Driving

    Authors: Yuan Chen, Zi-han Ding, Ziqin Wang, Yan Wang, Lijun Zhang, Si Liu

    Abstract: Despite real-time planners exhibiting remarkable performance in autonomous driving, the growing exploration of Large Language Models (LLMs) has opened avenues for enhancing the interpretability and controllability of motion planning. Nevertheless, LLM-based planners continue to encounter significant challenges, including elevated resource consumption and extended inference times, which pose substa… ▽ More

    Submitted 21 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

  40. arXiv:2406.13951  [pdf, other

    cs.CV

    Towards the in-situ Trunk Identification and Length Measurement of Sea Cucumbers via Bézier Curve Modelling

    Authors: Shuaixin Liu, Kunqian Li, Yilin Ding, Kuangwei Xu, Qianli Jiang, Q. M. Jonathan Wu, Dalei Song

    Abstract: We introduce a novel vision-based framework for in-situ trunk identification and length measurement of sea cucumbers, which plays a crucial role in the monitoring of marine ranching resources and mechanized harvesting. To model sea cucumber trunk curves with varying degrees of bending, we utilize the parametric Bézier curve due to its computational simplicity, stability, and extensive range of tra… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  41. arXiv:2406.13910  [pdf, other

    cs.RO cs.GR

    A-OctoMap: An Adaptive OctoMap for Online Motion Planning

    Authors: Yihui Mao, Shuo Liu

    Abstract: Traditional robotic motion planning methods often struggle with fixed resolutions in dynamically changing environments. To address these challenges, we introduce the A-OctoMap, an adaptive Octo-Tree structure that enhances spatial representation and facilitates real-time, efficient motion planning. This novel framework allows for dynamic space partitioning and multi-resolution queries, significant… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 8 pages, 6 figures

  42. arXiv:2406.13344  [pdf, other

    cs.CV

    WaterMono: Teacher-Guided Anomaly Masking and Enhancement Boosting for Robust Underwater Self-Supervised Monocular Depth Estimation

    Authors: Yilin Ding, Kunqian Li, Han Mei, Shuaixin Liu, Guojia Hou

    Abstract: Depth information serves as a crucial prerequisite for various visual tasks, whether on land or underwater. Recently, self-supervised methods have achieved remarkable performance on several terrestrial benchmarks despite the absence of depth annotations. However, in more challenging underwater scenarios, they encounter numerous brand-new obstacles such as the influence of marine life and degradati… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  43. arXiv:2406.13201  [pdf, other

    cs.LG cs.SI

    Toward Structure Fairness in Dynamic Graph Embedding: A Trend-aware Dual Debiasing Approach

    Authors: Yicong Li, Yu Yang, Jiannong Cao, Shuaiqi Liu, Haoran Tang, Guandong Xu

    Abstract: Recent studies successfully learned static graph embeddings that are structurally fair by preventing the effectiveness disparity of high- and low-degree vertex groups in downstream graph mining tasks. However, achieving structure fairness in dynamic graph embedding remains an open problem. Neglecting degree changes in dynamic graphs will significantly impair embedding effectiveness without notably… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  44. arXiv:2406.13125  [pdf, other

    cs.AI

    A Unified Framework for Combinatorial Optimization Based on Graph Neural Networks

    Authors: Yaochu **, Xueming Yan, Shiqing Liu, Xiangyu Wang

    Abstract: Graph neural networks (GNNs) have emerged as a powerful tool for solving combinatorial optimization problems (COPs), exhibiting state-of-the-art performance in both graph-structured and non-graph-structured domains. However, existing approaches lack a unified framework capable of addressing a wide range of COPs. After presenting a summary of representative COPs and a brief review of recent advance… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  45. arXiv:2406.13007  [pdf, other

    cs.CV

    NTIRE 2024 Challenge on Night Photography Rendering

    Authors: Egor Ershov, Artyom Panshin, Oleg Karasev, Sergey Korchagin, Shepelev Lev, Alexandr Startsev, Daniil Vladimirov, Ekaterina Zaychenkova, Nikola Banić, Dmitrii Iarchuk, Maria Efimova, Radu Timofte, Arseniy Terekhin, Shuwei Yue, Yuyang Liu, Minchen Wei, Lu Xu, Chao Zhang, Yasi Wang, Furkan Kınlı, Doğa Yılmaz, Barış Özcan, Furkan Kıraç, Shuai Liu, **gyuan Xiao , et al. (25 additional authors not shown)

    Abstract: This paper presents a review of the NTIRE 2024 challenge on night photography rendering. The goal of the challenge was to find solutions that process raw camera images taken in nighttime conditions, and thereby produce a photo-quality output images in the standard RGB (sRGB) space. Unlike the previous year's competition, the challenge images were collected with a mobile phone and the speed of algo… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 10 pages, 10 figures

  46. arXiv:2406.12834  [pdf, other

    cs.CV

    GroPrompt: Efficient Grounded Prompting and Adaptation for Referring Video Object Segmentation

    Authors: Ci-Siang Lin, I-Jieh Liu, Min-Hung Chen, Chien-Yi Wang, Sifei Liu, Yu-Chiang Frank Wang

    Abstract: Referring Video Object Segmentation (RVOS) aims to segment the object referred to by the query sentence throughout the entire video. Most existing methods require end-to-end training with dense mask annotations, which could be computation-consuming and less scalable. In this work, we aim to efficiently adapt foundation segmentation models for addressing RVOS from weak supervision with the proposed… ▽ More

    Submitted 23 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

    Comments: CVPR Workshop (CVinW) 2024. Project page: https://jack24658735.github.io/groprompt/

  47. arXiv:2406.12468  [pdf, other

    cs.CL cs.AI

    Adaptive Token Biaser: Knowledge Editing via Biasing Key Entities

    Authors: Baolong Bi, Shenghua Liu, Yiwei Wang, Lingrui Mei, Hongcheng Gao, Yilong Xu, Xueqi Cheng

    Abstract: The parametric knowledge memorized by large language models (LLMs) becomes outdated quickly. In-context editing (ICE) is currently the most effective method for updating the knowledge of LLMs. Recent advancements involve enhancing ICE by modifying the decoding strategy, obviating the need for altering internal model structures or adjusting external prompts. However, this enhancement operates acros… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  48. arXiv:2406.12382  [pdf, other

    cs.CL

    From Instance Training to Instruction Learning: Task Adapters Generation from Instructions

    Authors: Huanxuan Liao, Yao Xu, Shizhu He, Yuanzhe Zhang, Yanchao Hao, Sheng** Liu, Kang Liu, Jun Zhao

    Abstract: Large language models (LLMs) have acquired the ability to solve general tasks by utilizing instruction finetuning (IFT). However, IFT still relies heavily on instance training of extensive task data, which greatly limits the adaptability of LLMs to real-world scenarios where labeled task instances are scarce and broader task generalization becomes paramount. Contrary to LLMs, humans acquire skills… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  49. arXiv:2406.12360  [pdf, other

    cs.LG

    UrbanLLM: Autonomous Urban Activity Planning and Management with Large Language Models

    Authors: Yue Jiang, Qin Chao, Yile Chen, Xiucheng Li, Shuai Liu, Gao Cong

    Abstract: Location-based services play an critical role in improving the quality of our daily lives. Despite the proliferation of numerous specialized AI models within spatio-temporal context of location-based services, these models struggle to autonomously tackle problems regarding complex urban planing and management. To bridge this gap, we introduce UrbanLLM, a fine-tuned large language model (LLM) desig… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Submitted for EMNLP 2024

  50. arXiv:2406.12282  [pdf, other

    cs.LG

    SAGDFN: A Scalable Adaptive Graph Diffusion Forecasting Network for Multivariate Time Series Forecasting

    Authors: Yue Jiang, Xiucheng Li, Yile Chen, Shuai Liu, Weilong Kong, Antonis F. Lentzakis, Gao Cong

    Abstract: Time series forecasting is essential for our daily activities and precise modeling of the complex correlations and shared patterns among multiple time series is essential for improving forecasting performance. Spatial-Temporal Graph Neural Networks (STGNNs) are widely used in multivariate time series forecasting tasks and have achieved promising performance on multiple real-world datasets for thei… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Accepted at ICDE 2024