Skip to main content

Showing 1–6 of 6 results for author: Si, W M

.
  1. arXiv:2407.06955  [pdf, other

    cs.CR cs.CL

    ICLGuard: Controlling In-Context Learning Behavior for Applicability Authorization

    Authors: Wai Man Si, Michael Backes, Yang Zhang

    Abstract: In-context learning (ICL) is a recent advancement in the capabilities of large language models (LLMs). This feature allows users to perform a new task without updating the model. Concretely, users can address tasks during the inference time by conditioning on a few input-label pair demonstrations along with the test input. It is different than the conventional fine-tuning paradigm and offers more… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  2. arXiv:2311.14685  [pdf, other

    cs.CY cs.CL cs.CR cs.LG

    Comprehensive Assessment of Toxicity in ChatGPT

    Authors: Boyang Zhang, Xinyue Shen, Wai Man Si, Zeyang Sha, Zeyuan Chen, Ahmed Salem, Yun Shen, Michael Backes, Yang Zhang

    Abstract: Moderating offensive, hateful, and toxic language has always been an important but challenging topic in the domain of safe use in NLP. The emerging large language models (LLMs), such as ChatGPT, can potentially further accentuate this threat. Previous works have discovered that ChatGPT can generate toxic responses using carefully crafted inputs. However, limited research has been done to systemati… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

  3. arXiv:2308.03558  [pdf, other

    cs.CR cs.CL

    Mondrian: Prompt Abstraction Attack Against Large Language Models for Cheaper API Pricing

    Authors: Wai Man Si, Michael Backes, Yang Zhang

    Abstract: The Machine Learning as a Service (MLaaS) market is rapidly expanding and becoming more mature. For example, OpenAI's ChatGPT is an advanced large language model (LLM) that generates responses for various queries with associated fees. Although these models can deliver satisfactory performance, they are far from perfect. Researchers have long studied the vulnerabilities and limitations of LLMs, suc… ▽ More

    Submitted 7 August, 2023; originally announced August 2023.

  4. arXiv:2305.07406  [pdf, other

    cs.CR cs.CL cs.LG

    Two-in-One: A Model Hijacking Attack Against Text Generation Models

    Authors: Wai Man Si, Michael Backes, Yang Zhang, Ahmed Salem

    Abstract: Machine learning has progressed significantly in various applications ranging from face recognition to text generation. However, its success has been accompanied by different attacks. Recently a new attack has been proposed which raises both accountability and parasitic computing risks, namely the model hijacking attack. Nevertheless, this attack has only focused on image classification tasks. In… ▽ More

    Submitted 12 May, 2023; originally announced May 2023.

    Comments: To appear in the 32nd USENIX Security Symposium, August 2023, Anaheim, CA, USA

  5. arXiv:2209.03463  [pdf, other

    cs.CY cs.AI cs.CR cs.SI

    Why So Toxic? Measuring and Triggering Toxic Behavior in Open-Domain Chatbots

    Authors: Wai Man Si, Michael Backes, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini, Savvas Zannettou, Yang Zhang

    Abstract: Chatbots are used in many applications, e.g., automated agents, smart home assistants, interactive characters in online games, etc. Therefore, it is crucial to ensure they do not behave in undesired manners, providing offensive or toxic responses to users. This is not a trivial task as state-of-the-art chatbot models are trained on large, public datasets openly collected from the Internet. This pa… ▽ More

    Submitted 9 September, 2022; v1 submitted 7 September, 2022; originally announced September 2022.

    Journal ref: Published in ACM CCS 2022. Please cite the CCS version

  6. arXiv:2105.15054  [pdf, other

    cs.CL cs.AI

    Telling Stories through Multi-User Dialogue by Modeling Character Relations

    Authors: Wai Man Si, Prithviraj Ammanabrolu, Mark O. Riedl

    Abstract: This paper explores character-driven story continuation, in which the story emerges through characters' first- and second-person narration as well as dialogue -- requiring models to select language that is consistent with a character's persona and their relationships with other characters while following and advancing the story. We hypothesize that a multi-task model that trains on character dialo… ▽ More

    Submitted 31 May, 2021; originally announced May 2021.

    Comments: In Proceedings of SIGDIAL 2021