Jailbreak resistance benchmark across 52 recent LLMs (7 levels, redacted outputs)

We published a jailbreak/prompt-injection resistance benchmark against 52 models, on 7 escalating attack levels.

This is framed as a safety leaderboard, not a jailbreak guide:

  • single attempt per level (temp=0)

  • redacted outputs only

  • human-verified failures

Results table: rival.tips/jailbreak

Feedback welcome, especially on attack strategies and further models to test.