Skip to main content

Showing 1–1 of 1 results for author: Chirmule, A

.
  1. arXiv:2402.10601  [pdf, other

    cs.CL cs.AI

    Jailbreaking Proprietary Large Language Models using Word Substitution Cipher

    Authors: Divij Handa, Advait Chirmule, Bimal Gajera, Chitta Baral

    Abstract: Large Language Models (LLMs) are aligned to moral and ethical guidelines but remain susceptible to creative prompts called Jailbreak that can bypass the alignment process. However, most jailbreaking prompts contain harmful questions in the natural language (mainly English), which can be detected by the LLM themselves. In this paper, we present jailbreaking prompts encoded using cryptographic techn… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Comments: 15 pages