Virtual Context: Enhancing Jailbreak Attacks with Special Token Injection

Zhou, Yuqi; Lu, Lin; Sun, Hanchi; Zhou, Pan; Sun, Lichao

Computer Science > Cryptography and Security

arXiv:2406.19845 (cs)

[Submitted on 28 Jun 2024]

Title:Virtual Context: Enhancing Jailbreak Attacks with Special Token Injection

Authors:Yuqi Zhou, Lin Lu, Hanchi Sun, Pan Zhou, Lichao Sun

View PDF HTML (experimental)

Abstract:Jailbreak attacks on large language models (LLMs) involve inducing these models to generate harmful content that violates ethics or laws, posing a significant threat to LLM security. Current jailbreak attacks face two main challenges: low success rates due to defensive measures and high resource requirements for crafting specific prompts. This paper introduces Virtual Context, which leverages special tokens, previously overlooked in LLM security, to improve jailbreak attacks. Virtual Context addresses these challenges by significantly increasing the success rates of existing jailbreak methods and requiring minimal background knowledge about the target model, thus enhancing effectiveness in black-box settings without additional overhead. Comprehensive evaluations show that Virtual Context-assisted jailbreak attacks can improve the success rates of four widely used jailbreak methods by approximately 40% across various LLMs. Additionally, applying Virtual Context to original malicious behaviors still achieves a notable jailbreak effect. In summary, our research highlights the potential of special tokens in jailbreak attacks and recommends including this threat in red-teaming testing to comprehensively enhance LLM security.

Comments:	14 pages, 4 figures
Subjects:	Cryptography and Security (cs.CR)
Cite as:	arXiv:2406.19845 [cs.CR]
	(or arXiv:2406.19845v1 [cs.CR] for this version)
	https://doi.org/10.48550/arXiv.2406.19845

Submission history

From: Yuqi Zhou [view email]
[v1] Fri, 28 Jun 2024 11:35:54 UTC (1,322 KB)

Computer Science > Cryptography and Security

Title:Virtual Context: Enhancing Jailbreak Attacks with Special Token Injection

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Cryptography and Security

Title:Virtual Context: Enhancing Jailbreak Attacks with Special Token Injection

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators