Skip to main content

Showing 1–12 of 12 results for author: Mu, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.02622  [pdf, other

    cs.CR cs.AI

    Safeguarding Large Language Models: A Survey

    Authors: Yi Dong, Ronghui Mu, Yanghao Zhang, Siqi Sun, Tianle Zhang, Changshun Wu, Gaojie **, Yi Qi, **wei Hu, Jie Meng, Saddek Bensalem, Xiaowei Huang

    Abstract: In the burgeoning field of Large Language Models (LLMs), develo** a robust safety mechanism, colloquially known as "safeguards" or "guardrails", has become imperative to ensure the ethical use of LLMs within prescribed boundaries. This article provides a systematic literature review on the current status of this critical mechanism. It discusses its major challenges and how it can be enhanced int… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: under review. arXiv admin note: text overlap with arXiv:2402.01822

  2. arXiv:2405.03807  [pdf, other

    cs.RO cs.LG

    UniGen: Unified Modeling of Initial Agent States and Trajectories for Generating Autonomous Driving Scenarios

    Authors: Reza Mahjourian, Rongbing Mu, Valerii Likhosherstov, Paul Mougin, Xiukun Huang, Joao Messias, Shimon Whiteson

    Abstract: This paper introduces UniGen, a novel approach to generating new traffic scenarios for evaluating and improving autonomous driving software through simulation. Our approach models all driving scenario elements in a unified model: the position of new agents, their initial state, and their future motion trajectories. By predicting the distributions of all these variables from a shared global scenari… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: Accepted at ICRA 2024

  3. arXiv:2402.17729  [pdf, other

    cs.CV

    Towards Fairness-Aware Adversarial Learning

    Authors: Yanghao Zhang, Tianle Zhang, Ronghui Mu, Xiaowei Huang, Wenjie Ruan

    Abstract: Although adversarial training (AT) has proven effective in enhancing the model's robustness, the recently revealed issue of fairness in robustness has not been well addressed, i.e. the robust accuracy varies significantly among different categories. In this paper, instead of uniformly evaluating the model's average class performance, we delve into the issue of robust fairness, by considering the w… ▽ More

    Submitted 27 March, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: This work will appear in the CVPR 2024 conference proceedings

  4. arXiv:2402.01822  [pdf, ps, other

    cs.CL cs.AI

    Building Guardrails for Large Language Models

    Authors: Yi Dong, Ronghui Mu, Gaojie **, Yi Qi, **wei Hu, Xingyu Zhao, Jie Meng, Wenjie Ruan, Xiaowei Huang

    Abstract: As Large Language Models (LLMs) become more integrated into our daily lives, it is crucial to identify and mitigate their risks, especially when the risks can have profound impacts on human users and societies. Guardrails, which filter the inputs or outputs of LLMs, have emerged as a core safeguarding technology. This position paper takes a deep look at current open-source solutions (Llama Guard,… ▽ More

    Submitted 29 May, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

    Comments: Proceedings of the 41st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024

  5. arXiv:2312.06436  [pdf, other

    cs.LG cs.AI

    Reward Certification for Policy Smoothed Reinforcement Learning

    Authors: Ronghui Mu, Leandro Soriano Marcolino, Tianle Zhang, Yanghao Zhang, Xiaowei Huang, Wenjie Ruan

    Abstract: Reinforcement Learning (RL) has achieved remarkable success in safety-critical areas, but it can be weakened by adversarial attacks. Recent studies have introduced "smoothed policies" in order to enhance its robustness. Yet, it is still challenging to establish a provable guarantee to certify the bound of its total reward. Prior methods relied primarily on computing bounds using Lipschitz continui… ▽ More

    Submitted 12 December, 2023; v1 submitted 11 December, 2023; originally announced December 2023.

    Comments: This paper will be presented in AAAI2024

  6. arXiv:2307.04327  [pdf

    cs.RO eess.SY

    Legal Decision-making for Highway Automated Driving

    Authors: Xiaohan Ma, Wenhao Yu, Chengxiang Zhao, Changjun Wang, Wenhui Zhou, Guangming Zhao, Mingyue Ma, Weida Wang, Lin Yang, Rui Mu, Hong Wang, Jun Li

    Abstract: Compliance with traffic laws is a fundamental requirement for human drivers on the road, and autonomous vehicles must adhere to traffic laws as well. However, current autonomous vehicles prioritize safety and collision avoidance primarily in their decision-making and planning, which will lead to misunderstandings and distrust from human drivers and may even result in accidents in mixed traffic flo… ▽ More

    Submitted 9 July, 2023; originally announced July 2023.

    Comments: 14 pages, 17 figures

  7. arXiv:2305.11391  [pdf, other

    cs.AI cs.LG

    A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation

    Authors: Xiaowei Huang, Wenjie Ruan, Wei Huang, Gaojie **, Yi Dong, Changshun Wu, Saddek Bensalem, Ronghui Mu, Yi Qi, Xingyu Zhao, Kaiwen Cai, Yanghao Zhang, Sihao Wu, Peipei Xu, Dengyu Wu, Andre Freitas, Mustafa A. Mustafa

    Abstract: Large Language Models (LLMs) have exploded a new heatwave of AI for their ability to engage end-users in human-level conversations with detailed and articulate answers across many knowledge domains. In response to their fast adoption in many industrial applications, this survey concerns their safety and trustworthiness. First, we review known vulnerabilities and limitations of the LLMs, categorisi… ▽ More

    Submitted 27 August, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

  8. arXiv:2304.07511  [pdf, other

    cs.HC

    Pilgrimage to Pureland: Art, Perception and the Wutai Mural VR Reconstruction

    Authors: Rongxuan Mu, Yuhe Nie, Kent Cao, Ruoxin You, Yinzong Wei, Xin Tong

    Abstract: Virtual reality (VR) supports audiences to engage with cultural heritage proactively. We designed an easy-to-access and guided Pilgrimage To Pureland VR reconstruction of Dunhuang Mogao Grottoes to offer the general public an accessible and engaging way to explore the Dunhuang murals. We put forward an immersive VR reconstruction paradigm that can efficiently convert complex 2D artwork into a VR e… ▽ More

    Submitted 15 April, 2023; originally announced April 2023.

  9. arXiv:2303.10653  [pdf, other

    cs.LG cs.AI

    Randomized Adversarial Training via Taylor Expansion

    Authors: Gaojie **, ** Yi, Dengyu Wu, Ronghui Mu, Xiaowei Huang

    Abstract: In recent years, there has been an explosion of research into develo** more robust deep neural networks against adversarial examples. Adversarial training appears as one of the most successful methods. To deal with both the robustness against adversarial examples and the accuracy over clean examples, many works develop enhanced adversarial training methods to achieve various trade-offs between t… ▽ More

    Submitted 19 March, 2023; originally announced March 2023.

    Comments: CVPR 2023

  10. arXiv:2212.11746  [pdf, other

    cs.LG cs.MA

    Certified Policy Smoothing for Cooperative Multi-Agent Reinforcement Learning

    Authors: Ronghui Mu, Wenjie Ruan, Leandro Soriano Marcolino, Gaojie **, Qiang Ni

    Abstract: Cooperative multi-agent reinforcement learning (c-MARL) is widely applied in safety-critical scenarios, thus the analysis of robustness for c-MARL models is profoundly important. However, robustness certification for c-MARLs has not yet been explored in the community. In this paper, we propose a novel certification method, which is the first work to leverage a scalable approach for c-MARLs to dete… ▽ More

    Submitted 22 December, 2022; originally announced December 2022.

    Comments: This paper will appear in AAAI2023

  11. arXiv:2207.07539  [pdf, other

    cs.CV cs.LG

    3DVerifier: Efficient Robustness Verification for 3D Point Cloud Models

    Authors: Ronghui Mu, Wenjie Ruan, Leandro S. Marcolino, Qiang Ni

    Abstract: 3D point cloud models are widely applied in safety-critical scenes, which delivers an urgent need to obtain more solid proofs to verify the robustness of models. Existing verification method for point cloud model is time-expensive and computationally unattainable on large networks. Additionally, they cannot handle the complete PointNet model with joint alignment network (JANet) that contains multi… ▽ More

    Submitted 15 July, 2022; originally announced July 2022.

  12. arXiv:2111.05468  [pdf, other

    cs.CV

    Sparse Adversarial Video Attacks with Spatial Transformations

    Authors: Ronghui Mu, Wenjie Ruan, Leandro Soriano Marcolino, Qiang Ni

    Abstract: In recent years, a significant amount of research efforts concentrated on adversarial attacks on images, while adversarial video attacks have seldom been explored. We propose an adversarial attack strategy on videos, called DeepSAVA. Our model includes both additive perturbation and spatial transformation by a unified optimisation framework, where the structural similarity index (SSIM) measure is… ▽ More

    Submitted 9 November, 2021; originally announced November 2021.

    Comments: The short version of this work will appear in the BMVC 2021 conference