Skip to main content

Showing 1–50 of 218 results for author: Duan, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00499  [pdf, other

    cs.CL cs.AI cs.LG

    ConU: Conformal Uncertainty in Large Language Models with Correctness Coverage Guarantees

    Authors: Zhiyuan Wang, **hao Duan, Lu Cheng, Yue Zhang, Qingni Wang, Hengtao Shen, Xiaofeng Zhu, Xiaoshuang Shi, Kaidi Xu

    Abstract: Uncertainty quantification (UQ) in natural language generation (NLG) tasks remains an open challenge, exacerbated by the intricate nature of the recent large language models (LLMs). This study investigates adapting conformal prediction (CP), which can convert any heuristic measure of uncertainty into rigorous theoretical guarantees by constructing prediction sets, for black-box LLMs in open-ended… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: 13 pages, 9 figures, 6 tables

  2. arXiv:2406.18915  [pdf, other

    cs.RO cs.CV

    Manipulate-Anything: Automating Real-World Robots using Vision-Language Models

    Authors: Jiafei Duan, Wentao Yuan, Wilbert Pumacay, Yi Ru Wang, Kiana Ehsani, Dieter Fox, Ranjay Krishna

    Abstract: Large-scale endeavors like RT-1 and widespread community efforts such as Open-X-Embodiment have contributed to growing the scale of robot demonstration data. However, there is still an opportunity to improve the quality, quantity, and diversity of robot demonstration data. Although vision-language models have been shown to automatically generate demonstration data, their utility has been limited t… ▽ More

    Submitted 27 June, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

    Comments: Project page: https://robot-ma.github.io/

  3. arXiv:2406.15486  [pdf, other

    cs.CL cs.AI cs.LG

    SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention

    Authors: Qianchao Zhu, Jiangfei Duan, Chang Chen, Siran Liu, Xiuhong Li, Guanyu Feng, Xin Lv, Huanqi Cao, Xiao Chuanfu, Xingcheng Zhang, Dahua Lin, Chao Yang

    Abstract: Large language models (LLMs) now support extremely long context windows, but the quadratic complexity of vanilla attention results in significantly long Time-to-First-Token (TTFT) latency. Existing approaches to address this complexity require additional pretraining or finetuning, and often sacrifice model accuracy. In this paper, we first provide both theoretical and empirical foundations for nea… ▽ More

    Submitted 28 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  4. arXiv:2406.15279  [pdf, other

    cs.AI cs.CL

    Cross-Modality Safety Alignment

    Authors: Siyin Wang, Xingsong Ye, Qinyuan Cheng, Junwen Duan, Shimin Li, **lan Fu, Xipeng Qiu, Xuan**g Huang

    Abstract: As Artificial General Intelligence (AGI) becomes increasingly integrated into various facets of human life, ensuring the safety and ethical alignment of such systems is paramount. Previous studies primarily focus on single-modality threats, which may not suffice given the integrated and complex nature of cross-modality interactions. We introduce a novel safety alignment challenge called Safe Input… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  5. arXiv:2406.14326  [pdf, other

    cs.CL

    medIKAL: Integrating Knowledge Graphs as Assistants of LLMs for Enhanced Clinical Diagnosis on EMRs

    Authors: Mingyi Jia, Junwen Duan, Yan Song, Jianxin Wang

    Abstract: Electronic Medical Records (EMRs), while integral to modern healthcare, present challenges for clinical reasoning and diagnosis due to their complexity and information redundancy. To address this, we proposed medIKAL (Integrating Knowledge Graphs as Assistants of LLMs), a framework that combines Large Language Models (LLMs) with knowledge graphs (KGs) to enhance diagnostic capabilities. medIKAL as… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  6. arXiv:2406.10721  [pdf, other

    cs.RO cs.AI cs.CV

    RoboPoint: A Vision-Language Model for Spatial Affordance Prediction for Robotics

    Authors: Wentao Yuan, Jiafei Duan, Valts Blukis, Wilbert Pumacay, Ranjay Krishna, Adithyavairavan Murali, Arsalan Mousavian, Dieter Fox

    Abstract: From rearranging objects on a table to putting groceries into shelves, robots must plan precise action points to perform tasks accurately and reliably. In spite of the recent adoption of vision language models (VLMs) to control robot behavior, VLMs struggle to precisely articulate robot actions using language. We introduce an automatic synthetic data generation pipeline that instruction-tunes VLMs… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  7. arXiv:2405.15177  [pdf, other

    cs.LG cs.AI

    Diffusion Actor-Critic with Entropy Regulator

    Authors: Yinuo Wang, Likun Wang, Yuxuan Jiang, Wenjun Zou, Tong Liu, Xujie Song, Wenxuan Wang, Liming Xiao, Jiang Wu, **gliang Duan, Shengbo Eben Li

    Abstract: Reinforcement learning (RL) has proven highly effective in addressing complex decision-making and control tasks. However, in most traditional RL algorithms, the policy is typically parameterized as a diagonal Gaussian distribution with learned mean and variance, which constrains their capability to acquire complex policies. In response to this problem, we propose an online RL algorithm termed diff… ▽ More

    Submitted 15 June, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  8. arXiv:2405.06219  [pdf, other

    cs.LG cs.CL

    SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models

    Authors: Haojie Duanmu, Zhihang Yuan, Xiuhong Li, Jiangfei Duan, Xingcheng Zhang, Dahua Lin

    Abstract: Large language models (LLMs) can now handle longer sequences of tokens, enabling complex tasks like book understanding and generating lengthy novels. However, the key-value (KV) cache required for LLMs consumes substantial memory as context length increasing, becoming the bottleneck for deployment. In this paper, we present a strategy called SKVQ, which stands for sliding-window KV cache quantizat… ▽ More

    Submitted 13 May, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

  9. arXiv:2405.02794  [pdf, other

    cs.RO

    Octopi: Object Property Reasoning with Large Tactile-Language Models

    Authors: Samson Yu, Kelvin Lin, Anxing Xiao, Jiafei Duan, Harold Soh

    Abstract: Physical reasoning is important for effective robot manipulation. Recent work has investigated both vision and language modalities for physical reasoning; vision can reveal information about objects in the environment and language serves as an abstraction and communication medium for additional context. Although these works have demonstrated success on a variety of physical reasoning tasks, they a… ▽ More

    Submitted 4 June, 2024; v1 submitted 4 May, 2024; originally announced May 2024.

    Comments: Accepted at Robotics: Science and Systems (R:SS 2024)

  10. arXiv:2404.08023  [pdf, other

    q-bio.QM cs.LG

    Pathology-genomic fusion via biologically informed cross-modality graph learning for survival analysis

    Authors: Zeyu Zhang, Yuanshen Zhao, **gxian Duan, Yaou Liu, Hairong Zheng, Dong Liang, Zhenyu Zhang, Zhi-Cheng Li

    Abstract: The diagnosis and prognosis of cancer are typically based on multi-modal clinical data, including histology images and genomic data, due to the complex pathogenesis and high heterogeneity. Despite the advancements in digital pathology and high-throughput genome sequencing, establishing effective multi-modal fusion models for survival prediction and revealing the potential association between histo… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  11. arXiv:2404.06089  [pdf, other

    cs.HC cs.RO

    EVE: Enabling Anyone to Train Robots using Augmented Reality

    Authors: Jun Wang, Chun-Cheng Chang, Jiafei Duan, Dieter Fox, Ranjay Krishna

    Abstract: The increasing affordability of robot hardware is accelerating the integration of robots into everyday activities. However, training robots to automate tasks typically requires physical robots and expensive demonstration data from trained human annotators. Consequently, only those with access to physical robots produce demonstrations to train robots. To mitigate this issue, we introduce EVE, an iO… ▽ More

    Submitted 18 May, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

    Comments: 11 pages

  12. arXiv:2404.03179  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    UniAV: Unified Audio-Visual Perception for Multi-Task Video Localization

    Authors: Tiantian Geng, Teng Wang, Yanfu Zhang, **ming Duan, Weili Guan, Feng Zheng

    Abstract: Video localization tasks aim to temporally locate specific instances in videos, including temporal action localization (TAL), sound event detection (SED) and audio-visual event localization (AVEL). Existing methods over-specialize on each task, overlooking the fact that these instances often occur in the same video to form the complete video content. In this work, we present UniAV, a Unified Audio… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  13. arXiv:2404.02445  [pdf, other

    cs.DC

    MOPAR: A Model Partitioning Framework for Deep Learning Inference Services on Serverless Platforms

    Authors: Jiaang Duan, Shiyou Qian, Dingyu Yang, Hanwen Hu, Jian Cao, Guangtao Xue

    Abstract: With its elastic power and a pay-as-you-go cost model, the deployment of deep learning inference services (DLISs) on serverless platforms is emerging as a prevalent trend. However, the varying resource requirements of different layers in DL models hinder resource utilization and increase costs, when DLISs are deployed as a single function on serverless platforms. To tackle this problem, we propose… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  14. arXiv:2404.02015  [pdf, other

    cs.DC

    MuxServe: Flexible Spatial-Temporal Multiplexing for Multiple LLM Serving

    Authors: Jiangfei Duan, Runyu Lu, Haojie Duanmu, Xiuhong Li, Xingcheng Zhang, Dahua Lin, Ion Stoica, Hao Zhang

    Abstract: Large language models (LLMs) have demonstrated remarkable performance, and organizations are racing to serve LLMs of varying sizes as endpoints for use-cases like chat, programming and search. However, efficiently serving multiple LLMs poses significant challenges for existing approaches due to varying popularity of LLMs. In the paper, we present MuxServe, a flexible spatial-temporal multiplexing… ▽ More

    Submitted 12 June, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

  15. arXiv:2403.19931  [pdf, other

    cs.NI

    DHNet: A Distributed Network Architecture for Smart Home

    Authors: Chaoqi Zhou, **gpu Duan, YuPeng Xiao, Qing Li, Dingding Chen, Ruobin Zheng, Shaoteng Liu

    Abstract: With the increasing popularity of smart homes, more and more devices need to connect to home networks. Traditional home networks mainly rely on centralized networking, where an excessive number of devices in the centralized topology can increase the pressure on the central router, potentially leading to decreased network performance metrics such as communication latency. To address the latency per… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  16. arXiv:2403.17032  [pdf, ps, other

    cs.LG

    Stochastic parameter reduced-order model based on hybrid machine learning approaches

    Authors: Cheng Fang, **qiao Duan

    Abstract: Establishing appropriate mathematical models for complex systems in natural phenomena not only helps deepen our understanding of nature but can also be used for state estimation and prediction. However, the extreme complexity of natural phenomena makes it extremely challenging to develop full-order models (FOMs) and apply them to studying many quantities of interest. In contrast, appropriate reduc… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  17. arXiv:2403.16240  [pdf, other

    cs.CV math.DS math.OC

    Low Rank Groupwise Deformations for Motion Tracking in Cardiac Cine MRI

    Authors: Sean Rendell, **ming Duan

    Abstract: Diffeomorphic image registration is a commonly used method to deform one image to resemble another. While war** a single image to another is useful, it can be advantageous to warp multiple images simultaneously, such as in tracking the motion of the heart across a sequence of images. In this paper, our objective is to propose a novel method capable of registering a group or sequence of images to… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: A thesis submitted to the University of Birmingham for MSc Degree

  18. arXiv:2403.15447  [pdf, other

    cs.CL cs.AI

    Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression

    Authors: Junyuan Hong, **hao Duan, Chenhui Zhang, Zhangheng Li, Chulin Xie, Kelsey Lieberman, James Diffenderfer, Brian Bartoldson, Ajay Jaiswal, Kaidi Xu, Bhavya Kailkhura, Dan Hendrycks, Dawn Song, Zhangyang Wang, Bo Li

    Abstract: Compressing high-capability Large Language Models (LLMs) has emerged as a favored strategy for resource-efficient inferences. While state-of-the-art (SoTA) compression methods boast impressive advancements in preserving benign task performance, the potential risks of compression in terms of safety and trustworthiness have been largely neglected. This study conducts the first, thorough evaluation o… ▽ More

    Submitted 4 June, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

    Comments: Accepted to ICML'24

  19. arXiv:2403.14097  [pdf, other

    cs.DC

    Parcae: Proactive, Liveput-Optimized DNN Training on Preemptible Instances

    Authors: Jiangfei Duan, Ziang Song, Xupeng Miao, Xiaoli Xi, Dahua Lin, Harry Xu, Minjia Zhang, Zhihao Jia

    Abstract: Deep neural networks (DNNs) are becoming progressively large and costly to train. This paper aims to reduce DNN training costs by leveraging preemptible instances on modern clouds, which can be allocated at a much lower price when idle but may be preempted by the cloud provider at any time. Prior work that supports DNN training on preemptive instances employs a reactive approach to handling instan… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: NSDI '24

  20. arXiv:2403.12847  [pdf, other

    cs.LG

    Policy Bifurcation in Safe Reinforcement Learning

    Authors: Wenjun Zou, Yao Lyu, Jie Li, Yujie Yang, Shengbo Eben Li, **gliang Duan, Xianyuan Zhan, **g**g Liu, Yaqin Zhang, Keqiang Li

    Abstract: Safe reinforcement learning (RL) offers advanced solutions to constrained optimal control problems. Existing studies in safe RL implicitly assume continuity in policy functions, where policies map states to actions in a smooth, uninterrupted manner; however, our research finds that in some scenarios, the feasible policy should be discontinuous or multi-valued, interpolating between discontinuous l… ▽ More

    Submitted 28 March, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

  21. arXiv:2403.12152  [pdf, other

    cs.CV

    Development of Automated Neural Network Prediction for Echocardiographic Left ventricular Ejection Fraction

    Authors: Yuting Zhang, Boyang Liu, Karina V. Bunting, David Brind, Alexander Thorley, Andreas Karwath, Wenqi Lu, Diwei Zhou, Xiaoxia Wang, Alastair R. Mobley, Otilia Tica, Georgios Gkoutos, Dipak Kotecha, **ming Duan

    Abstract: The echocardiographic measurement of left ventricular ejection fraction (LVEF) is fundamental to the diagnosis and classification of patients with heart failure (HF). In order to quantify LVEF automatically and accurately, this paper proposes a new pipeline method based on deep neural networks and ensemble learning. Within the pipeline, an Atrous Convolutional Neural Network (ACNN) was first train… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Accepted to Frontiers in Medicine

  22. arXiv:2403.11484  [pdf, other

    cs.RO

    Robot Navigation in Unknown and Cluttered Workspace with Dynamical System Modulation in Starshaped Roadmap

    Authors: Kai Chen, Haichao Liu, Yulin Li, Jianghua Duan, Lei Zhu, Jun Ma

    Abstract: This paper presents a novel reactive motion planning framework for navigating robots in unknown and cluttered 2D workspace. Typical existing methods are developed by enforcing the robot staying in free regions represented by the locally extracted ellipse or polygon. Instead, we navigate the robot in free space with an alternate starshaped decomposition, which is calculated directly from real-time… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  23. arXiv:2403.09351  [pdf, other

    cs.CR

    LDPRecover: Recovering Frequencies from Poisoning Attacks against Local Differential Privacy

    Authors: Xinyue Sun, Qingqing Ye, Haibo Hu, Jiawei Duan, Tianyu Wo, Jie Xu, Renyu Yang

    Abstract: Local differential privacy (LDP), which enables an untrusted server to collect aggregated statistics from distributed users while protecting the privacy of those users, has been widely deployed in practice. However, LDP protocols for frequency estimation are vulnerable to poisoning attacks, in which an attacker can poison the aggregated frequencies by manipulating the data sent from malicious user… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: This paper has been accepted by ICDE 2024

  24. arXiv:2403.07289  [pdf, other

    cs.CV

    Rediscovering BCE Loss for Uniform Classification

    Authors: Qiufu Li, Xi Jia, Jiancan Zhou, Linlin Shen, **ming Duan

    Abstract: This paper introduces the concept of uniform classification, which employs a unified threshold to classify all samples rather than adaptive threshold classifying each individual sample. We also propose the uniform classification accuracy as a metric to measure the model's performance in uniform classification. Furthermore, begin with a naive loss, we mathematically derive a loss function suitable… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  25. arXiv:2403.01210  [pdf, other

    cs.CV cs.AI

    SAR-AE-SFP: SAR Imagery Adversarial Example in Real Physics domain with Target Scattering Feature Parameters

    Authors: Jiahao Cui, Jiale Duan, Binyan Luo, Hang Cao, Wang Guo, Haifeng Li

    Abstract: Deep neural network-based Synthetic Aperture Radar (SAR) target recognition models are susceptible to adversarial examples. Current adversarial example generation methods for SAR imagery primarily operate in the 2D digital domain, known as image adversarial examples. Recent work, while considering SAR imaging scatter mechanisms, fails to account for the actual imaging process, rendering attacks in… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

    Comments: 10 pages, 9 figures, 2 tables

  26. arXiv:2402.19150  [pdf, other

    cs.CV

    Unveiling Typographic Deceptions: Insights of the Typographic Vulnerability in Large Vision-Language Model

    Authors: Hao Cheng, Erjia Xiao, **dong Gu, Le Yang, **hao Duan, Jize Zhang, Jiahang Cao, Kaidi Xu, Ren**g Xu

    Abstract: Large Vision-Language Models (LVLMs) rely on vision encoders and Large Language Models (LLMs) to exhibit remarkable capabilities on various multi-modal tasks in the joint space of vision and language. However, the Typographic Attack, which disrupts vision-language models (VLMs) such as Contrastive Language-Image Pretraining (CLIP), has also been expected to be a security threat to LVLMs. Firstly,… ▽ More

    Submitted 21 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

  27. arXiv:2402.15525  [pdf, other

    cs.CL cs.CY

    Detecting misinformation through Framing Theory: the Frame Element-based Model

    Authors: Guan Wang, Rebecca Frederick, **glong Duan, William Wong, Verica Rupar, Weihua Li, Quan Bai

    Abstract: In this paper, we delve into the rapidly evolving challenge of misinformation detection, with a specific focus on the nuanced manipulation of narrative frames - an under-explored area within the AI community. The potential for Generative AI models to generate misleading narratives underscores the urgency of this problem. Drawing from communication and framing theories, we posit that the presentati… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

    Comments: 17 pages, 9 figures, 7 tables

  28. arXiv:2402.14259  [pdf, other

    cs.CL cs.AI cs.LG

    Word-Sequence Entropy: Towards Uncertainty Estimation in Free-Form Medical Question Answering Applications and Beyond

    Authors: Zhiyuan Wang, **hao Duan, Chenxi Yuan, Qingyu Chen, Tianlong Chen, Huaxiu Yao, Yue Zhang, Ren Wang, Kaidi Xu, Xiaoshuang Shi

    Abstract: Uncertainty estimation plays a pivotal role in ensuring the reliability of safety-critical human-AI interaction systems, particularly in the medical domain. However, a general method for quantifying the uncertainty of free-form answers has yet to be established in open-ended medical question-answering (QA) tasks, where irrelevant words and sequences with limited semantic information can be the pri… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    Comments: 18 pages

  29. arXiv:2402.12348  [pdf, other

    cs.CL cs.AI cs.LG

    GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic Evaluations

    Authors: **hao Duan, Renming Zhang, James Diffenderfer, Bhavya Kailkhura, Lichao Sun, Elias Stengel-Eskin, Mohit Bansal, Tianlong Chen, Kaidi Xu

    Abstract: As Large Language Models (LLMs) are integrated into critical real-world applications, their strategic and logical reasoning abilities are increasingly crucial. This paper evaluates LLMs' reasoning abilities in competitive environments through game-theoretic tasks, e.g., board and card games that require pure logic and strategic reasoning to compete with opponents. We first propose GTBench, a langu… ▽ More

    Submitted 10 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: 26 pages; the first two authors contributed equally; GTBench HF Leaderboard: https://huggingface.co/spaces/GTBench/GTBench

  30. arXiv:2402.08191  [pdf, other

    cs.RO cs.AI cs.LG

    THE COLOSSEUM: A Benchmark for Evaluating Generalization for Robotic Manipulation

    Authors: Wilbert Pumacay, Ishika Singh, Jiafei Duan, Ranjay Krishna, Jesse Thomason, Dieter Fox

    Abstract: To realize effective large-scale, real-world robotic applications, we must evaluate how well our robot policies adapt to changes in environmental conditions. Unfortunately, a majority of studies evaluate robot performance in environments closely resembling or even identical to the training setup. We present THE COLOSSEUM, a novel simulation benchmark, with 20 diverse manipulation tasks, that enabl… ▽ More

    Submitted 27 May, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

    Comments: RSS 2024. 33 pages

  31. arXiv:2402.03585  [pdf, other

    cs.CV eess.IV

    Decoder-Only Image Registration

    Authors: Xi Jia, Wenqi Lu, Xinxing Cheng, **ming Duan

    Abstract: In unsupervised medical image registration, the predominant approaches involve the utilization of a encoder-decoder network architecture, allowing for precise prediction of dense, full-resolution displacement fields from given paired images. Despite its widespread use in the literature, we argue for the necessity of making both the encoder and decoder learnable in such an architecture. For this, w… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  32. arXiv:2402.00450  [pdf, other

    cs.LG

    CPT: Competence-progressive Training Strategy for Few-shot Node Classification

    Authors: Qilong Yan, Yufeng Zhang, **ghao Zhang, **gpu Duan, Jian Yin

    Abstract: Graph Neural Networks (GNNs) have made significant advancements in node classification, but their success relies on sufficient labeled nodes per class in the training data. Real-world graph data often exhibits a long-tail distribution with sparse labels, emphasizing the importance of GNNs' ability in few-shot node classification, which entails categorizing nodes with limited data. Traditional epis… ▽ More

    Submitted 23 February, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2206.11972 by other authors

  33. arXiv:2401.11261  [pdf, other

    cs.LG cs.CV

    Diffusion Model Conditioning on Gaussian Mixture Model and Negative Gaussian Mixture Gradient

    Authors: Weiguo Lu, Xuan Wu, Deng Ding, **qiao Duan, Jirong Zhuang, Gangnan Yuan

    Abstract: Diffusion models (DMs) are a type of generative model that has a huge impact on image synthesis and beyond. They achieve state-of-the-art generation results in various generative tasks. A great diversity of conditioning inputs, such as text or bounding boxes, are accessible to control the generation. In this work, we propose a conditioning mechanism utilizing Gaussian mixture models (GMMs) as feat… ▽ More

    Submitted 1 February, 2024; v1 submitted 20 January, 2024; originally announced January 2024.

  34. arXiv:2401.10416  [pdf, other

    cs.CV

    DataViz3D: An Novel Method Leveraging Online Holographic Modeling for Extensive Dataset Preprocessing and Visualization

    Authors: **li Duan

    Abstract: DataViz3D is an innovative online software that transforms complex datasets into interactive 3D spatial models using holographic technology. This tool enables users to generate scatter plot within a 3D space, accurately mapped to the XYZ coordinates of the dataset, providing a vivid and intuitive understanding of the spatial relationships inherent in the data. DataViz3D's user friendly interface m… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

  35. arXiv:2401.00657  [pdf, other

    math.OC cs.CV math.SP

    Optimizing ADMM and Over-Relaxed ADMM Parameters for Linear Quadratic Problems

    Authors: **tao Song, Wenqi Lu, Yunwen Lei, Yuchao Tang, Zhenkuan Pan, **ming Duan

    Abstract: The Alternating Direction Method of Multipliers (ADMM) has gained significant attention across a broad spectrum of machine learning applications. Incorporating the over-relaxation technique shows potential for enhancing the convergence rate of ADMM. However, determining optimal algorithmic parameters, including both the associated penalty and relaxation parameters, often relies on empirical approa… ▽ More

    Submitted 31 December, 2023; originally announced January 2024.

    Comments: Accepted to AAAI 2024

  36. arXiv:2312.10948  [pdf, other

    cs.CV cs.AI

    A Multimodal Approach for Advanced Pest Detection and Classification

    Authors: **li Duan, Haoyu Ding, Sung Kim

    Abstract: This paper presents a novel multi modal deep learning framework for enhanced agricultural pest detection, combining tiny-BERT's natural language processing with R-CNN and ResNet-18's image processing. Addressing limitations of traditional CNN-based visual methods, this approach integrates textual context for more accurate pest identification. The R-CNN and ResNet-18 integration tackles deep CNN is… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

  37. A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the Ugly

    Authors: Yifan Yao, **hao Duan, Kaidi Xu, Yuanfang Cai, Zhibo Sun, Yue Zhang

    Abstract: Large Language Models (LLMs), such as ChatGPT and Bard, have revolutionized natural language understanding and generation. They possess deep language comprehension, human-like text generation capabilities, contextual awareness, and robust problem-solving skills, making them invaluable in various domains (e.g., search engines, customer support, translation). In the meantime, LLMs have also gained t… ▽ More

    Submitted 20 March, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

  38. arXiv:2312.01836  [pdf, other

    cs.RO cs.AI

    Integrated Drill Boom Hole-Seeking Control via Reinforcement Learning

    Authors: Haoqi Yan, Haoyuan Xu, Hongbo Gao, Fei Ma, Shengbo Eben Li, **gliang Duan

    Abstract: Intelligent drill boom hole-seeking is a promising technology for enhancing drilling efficiency, mitigating potential safety hazards, and relieving human operators. Most existing intelligent drill boom control methods rely on a hierarchical control framework based on inverse kinematics. However, these methods are generally time-consuming due to the computational complexity of inverse kinematics an… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

  39. arXiv:2312.00084  [pdf, other

    cs.CV

    Can Protective Perturbation Safeguard Personal Data from Being Exploited by Stable Diffusion?

    Authors: Zhengyue Zhao, **hao Duan, Kaidi Xu, Chenan Wang, Rui Zhang, Zidong Du, Qi Guo, Xing Hu

    Abstract: Stable Diffusion has established itself as a foundation model in generative AI artistic applications, receiving widespread research and application. Some recent fine-tuning methods have made it feasible for individuals to implant personalized concepts onto the basic Stable Diffusion model with minimal computational costs on small datasets. However, these innovations have also given rise to issues… ▽ More

    Submitted 24 June, 2024; v1 submitted 30 November, 2023; originally announced December 2023.

  40. arXiv:2311.15566  [pdf, other

    cs.DC cs.CL cs.LG

    SpotServe: Serving Generative Large Language Models on Preemptible Instances

    Authors: Xupeng Miao, Chunan Shi, Jiangfei Duan, Xiaoli Xi, Dahua Lin, Bin Cui, Zhihao Jia

    Abstract: The high computational and memory requirements of generative large language models (LLMs) make it challenging to serve them cheaply. This paper aims to reduce the monetary cost for serving LLMs by leveraging preemptible GPU instances on modern clouds, which offer accesses to spare GPUs at a much cheaper price than regular instances but may be preempted by the cloud at any time. Serving LLMs on pre… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: ASPLOS 2024

  41. arXiv:2311.14097  [pdf, other

    cs.CV

    ACT-Diffusion: Efficient Adversarial Consistency Training for One-step Diffusion Models

    Authors: Fei Kong, **hao Duan, Lichao Sun, Hao Cheng, Ren**g Xu, Hengtao Shen, Xiaofeng Zhu, Xiaoshuang Shi, Kaidi Xu

    Abstract: Though diffusion models excel in image generation, their step-by-step denoising leads to slow generation speeds. Consistency training addresses this issue with single-step sampling but often produces lower-quality generations and requires high training costs. In this paper, we show that optimizing consistency training loss minimizes the Wasserstein distance between target and generated distributio… ▽ More

    Submitted 28 March, 2024; v1 submitted 23 November, 2023; originally announced November 2023.

    Comments: To appear in CVPR 2024

  42. arXiv:2311.12308  [pdf, other

    cs.DC cs.SE

    Jup2Kub: algorithms and a system to translate a Jupyter Notebook pipeline to a fault tolerant distributed Kubernetes deployment

    Authors: **li Duan, Shasha Dennis

    Abstract: Scientific workflows facilitate computational, data manipulation, and sometimes visualization steps for scientific data analysis. They are vital for reproducing and validating experiments, usually involving computational steps in scientific simulations and data analysis. These workflows are often developed by domain scientists using Jupyter notebooks, which are convenient yet face limitations: the… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

    Comments: for associated software, see https://github.com/shirou10086/Scientificworkflow

  43. arXiv:2311.04193  [pdf, other

    cs.CV cs.AI

    Selective Visual Representations Improve Convergence and Generalization for Embodied AI

    Authors: Ainaz Eftekhar, Kuo-Hao Zeng, Jiafei Duan, Ali Farhadi, Ani Kembhavi, Ranjay Krishna

    Abstract: Embodied AI models often employ off the shelf vision backbones like CLIP to encode their visual observations. Although such general purpose representations encode rich syntactic and semantic information about the scene, much of this information is often irrelevant to the specific task at hand. This introduces noise within the learning process and distracts the agent's focus from task-relevant visu… ▽ More

    Submitted 9 March, 2024; v1 submitted 7 November, 2023; originally announced November 2023.

    Comments: See project website: https://embodied-codebook.github.io

  44. arXiv:2311.03408  [pdf, other

    cs.LG cs.AI cs.NE quant-ph

    Training Multi-layer Neural Networks on Ising Machine

    Authors: Xujie Song, Tong Liu, Shengbo Eben Li, **gliang Duan, Wenxuan Wang, Keqiang Li

    Abstract: As a dedicated quantum device, Ising machines could solve large-scale binary optimization problems in milliseconds. There is emerging interest in utilizing Ising machines to train feedforward neural networks due to the prosperity of generative artificial intelligence. However, existing methods can only train single-layer feedforward networks because of the complex nonlinear network topology. This… ▽ More

    Submitted 5 November, 2023; originally announced November 2023.

  45. arXiv:2311.02523  [pdf, other

    cs.CV cs.AI cs.HC

    UniTSFace: Unified Threshold Integrated Sample-to-Sample Loss for Face Recognition

    Authors: Qiufu Li, Xi Jia, Jiancan Zhou, Linlin Shen, **ming Duan

    Abstract: Sample-to-class-based face recognition models can not fully explore the cross-sample relationship among large amounts of facial images, while sample-to-sample-based models require sophisticated pairing processes for training. Furthermore, neither method satisfies the requirements of real-world face verification applications, which expect a unified threshold separating positive from negative facial… ▽ More

    Submitted 4 November, 2023; originally announced November 2023.

    Comments: Accepted by Neurips 2023

  46. arXiv:2311.02344  [pdf, other

    cs.CL cs.AI

    You Only Forward Once: Prediction and Rationalization in A Single Forward Pass

    Authors: Han Jiang, Junwen Duan, Zhe Qu, Jianxin Wang

    Abstract: Unsupervised rationale extraction aims to extract concise and contiguous text snippets to support model predictions without any annotated rationale. Previous studies have used a two-phase framework known as the Rationalizing Neural Prediction (RNP) framework, which follows a generate-then-predict paradigm. They assumed that the extracted explanation, called rationale, should be sufficient to predi… ▽ More

    Submitted 4 November, 2023; originally announced November 2023.

    Comments: 20 pages, 5 figures, and 11 tables

  47. arXiv:2310.19022  [pdf, other

    math.OC cs.LG eess.SY

    Optimization Landscape of Policy Gradient Methods for Discrete-time Static Output Feedback

    Authors: **gliang Duan, Jie Li, Xuyang Chen, Kai Zhao, Shengbo Eben Li, Lin Zhao

    Abstract: In recent times, significant advancements have been made in delving into the optimization landscape of policy gradient methods for achieving optimal control in linear time-invariant (LTI) systems. Compared with state-feedback control, output-feedback control is more prevalent since the underlying state of the system may not be fully observed in many practical settings. This paper analyzes the opti… ▽ More

    Submitted 29 October, 2023; originally announced October 2023.

    Journal ref: IEEE Transactions on Cybernetics, 2023

  48. arXiv:2310.14906  [pdf, other

    cs.LG cs.AI

    DYNAMITE: Dynamic Interplay of Mini-Batch Size and Aggregation Frequency for Federated Learning with Static and Streaming Dataset

    Authors: Weijie Liu, Xiaoxi Zhang, **gpu Duan, Carlee Joe-Wong, Zhi Zhou, Xu Chen

    Abstract: Federated Learning (FL) is a distributed learning paradigm that can coordinate heterogeneous edge devices to perform model training without sharing private data. While prior works have focused on analyzing FL convergence with respect to hyperparameters like batch size and aggregation frequency, the joint effects of adjusting these parameters on model performance, training time, and resource consum… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Comments: 20 pages, 12 figures

    ACM Class: I.2.6

  49. arXiv:2310.07018  [pdf, other

    cs.CL cs.AI cs.RO

    NEWTON: Are Large Language Models Capable of Physical Reasoning?

    Authors: Yi Ru Wang, Jiafei Duan, Dieter Fox, Siddhartha Srinivasa

    Abstract: Large Language Models (LLMs), through their contextualized representations, have been empirically proven to encapsulate syntactic, semantic, word sense, and common-sense knowledge. However, there has been limited exploration of their physical reasoning abilities, specifically concerning the crucial attributes for comprehending everyday objects. To address this gap, we introduce NEWTON, a repositor… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023 Findings; 8 pages, 3 figures, 7 tables; Project page: https://newtonreasoning.github.io

  50. arXiv:2310.06059  [pdf, other

    cs.LG math.DS

    Early Warning Prediction with Automatic Labeling in Epilepsy Patients

    Authors: Peng Zhang, Ting Gao, ** Guo, **qiao Duan, Sergey Nikolenko

    Abstract: Early warning for epilepsy patients is crucial for their safety and well-being, in particular to prevent or minimize the severity of seizures. Through the patients' EEG data, we propose a meta learning framework to improve the prediction of early ictal signals. The proposed bi-level optimization framework can help automatically label noisy data at the early ictal stage, as well as optimize the tra… ▽ More

    Submitted 11 January, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

    Comments: 13 pages,4 figures