We published a jailbreak/prompt-injection resistance benchmark against 52 models, on 7 escalating attack levels.
This is framed as a safety leaderboard, not a jailbreak guide:
-
single attempt per level (temp=0)
-
redacted outputs only
-
human-verified failures
Results table: rival.tips/jailbreak
Feedback welcome, especially on attack strategies and further models to test.