Skip to main content

Showing 1–3 of 3 results for author: Phute, M

.
  1. arXiv:2404.01361  [pdf, other

    cs.CL cs.AI cs.HC cs.LG

    LLM Attributor: Interactive Visual Attribution for LLM Generation

    Authors: Seongmin Lee, Zijie J. Wang, Aishwarya Chakravarthy, Alec Helbling, ShengYun Peng, Mansi Phute, Duen Horng Chau, Minsuk Kahng

    Abstract: While large language models (LLMs) have shown remarkable capability to generate convincing text across diverse domains, concerns around its potential risks have highlighted the importance of understanding the rationale behind text generation. We present LLM Attributor, a Python library that provides interactive visualizations for training data attribution of an LLM's text generation. Our library o… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: 8 pages, 3 figures, For a video demo, see https://youtu.be/mIG2MDQKQxM

  2. arXiv:2308.16258  [pdf, other

    cs.CV

    Robust Principles: Architectural Design Principles for Adversarially Robust CNNs

    Authors: ShengYun Peng, Weilin Xu, Cory Cornelius, Matthew Hull, Kevin Li, Rahul Duggal, Mansi Phute, Jason Martin, Duen Horng Chau

    Abstract: Our research aims to unify existing works' diverging opinions on how architectural components affect the adversarial robustness of CNNs. To accomplish our goal, we synthesize a suite of three generalizable robust architectural design principles: (a) optimal range for depth and width configurations, (b) preferring convolutional over patchify stem stage, and (c) robust residual block design through… ▽ More

    Submitted 31 August, 2023; v1 submitted 30 August, 2023; originally announced August 2023.

    Comments: Published at BMVC'23

  3. arXiv:2308.07308  [pdf, other

    cs.CL cs.AI

    LLM Self Defense: By Self Examination, LLMs Know They Are Being Tricked

    Authors: Mansi Phute, Alec Helbling, Matthew Hull, ShengYun Peng, Sebastian Szyller, Cory Cornelius, Duen Horng Chau

    Abstract: Large language models (LLMs) are popular for high-quality text generation but can produce harmful content, even when aligned with human values through reinforcement learning. Adversarial prompts can bypass their safety measures. We propose LLM Self Defense, a simple approach to defend against these attacks by having an LLM screen the induced responses. Our method does not require any fine-tuning,… ▽ More

    Submitted 2 May, 2024; v1 submitted 14 August, 2023; originally announced August 2023.