Jailbreaking with Universal Multi-Prompts

Hsu, Yu-Ling; Su, Hsuan; Chen, Shang-Tse

doi:10.18653/v1/2025.findings-naacl.274

Jailbreaking with Universal Multi-Prompts

Journal

2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Proceedings of the Conference Findings, NAACL 2025

Start Page

4870

End Page

4891

Date Issued

2025-04

Author(s)

Hsu, Yu-Ling

Su, Hsuan

Chen, Shang-Tse

DOI

10.18653/v1/2025.findings-naacl.274

URI

https://www.scopus.com/record/display.uri?eid=2-s2.0-105028790543&origin=resultslist

https://scholars.lib.ntu.edu.tw/handle/123456789/737218

Abstract

Large language models (LLMs) have seen rapid development in recent years, revolutionizing various applications and significantly enhancing convenience and productivity. However, alongside their impressive capabilities, ethical concerns and new types of attacks, such as jailbreaking, have emerged. While most prompting techniques focus on optimizing adversarial inputs for individual cases, resulting in higher computational costs when dealing with large datasets. Less research has addressed the more general setting of training a universal attacker that can transfer to unseen tasks. In this paper, we introduce JUMP, a prompt-based method designed to jailbreak LLMs using universal multi-prompts. We also adapt our approach for defense, which we term DUMP. Experimental results demonstrate that our method for optimizing universal multi-prompts outperforms existing techniques.

Event(s)

2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics, NAACL 2025

Publisher

Association for Computational Linguistics

Type

conference paper

Jailbreaking with Universal Multi-Prompts

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)