Third-person prompting seems very jailbreak-resistant

rokiyo · May 7, 2023, 12:37pm

I’ve been playing around with a technique I discovered on the Aetherius_AI_Assistant GitHub repo: Presenting the conversation history as a third-party transcript to GPT-3.5, and asking it to assess the conversation independently. So far, this is seeming to be extremely effective at making my instructions stick:

This test dumps the whole conversation so far into a single User message like so:
Transcript:“”"
Analyst:
User:
“”"

I am thinking this technique could easily be combined with a “Respond in JSON” prompt to make it more code-readable, and will likely save on token count over longer conversations as reasoning from prior steps don’t need to be retransmitted on every turn of the conversation.

EricGT · May 7, 2023, 1:14pm

very jailbreak-resistant

Have you seen this?

Surprising spelling and grammar issues → turned out a jailbreak vector

Topic		Replies	Views
How to prevent malicious questions / jailbreak prompts / prompt injection attacks when using API GPT3.5 API	5	4732	March 6, 2023
Unveiling Hidden Instructions in Chatbots Bugs bug , risks	18	9673	February 5, 2024
The Prompt-Defender Initiative: Advancing GPT Safety Standards Prompting gpt-4 , chatgpt , api	3	2105	May 22, 2024
A piece of information I thought I'd share Prompting chatgpt	5	688	August 28, 2024
How to avoid GPTs give out it's instruction? Prompting gpt-4	29	7411	June 2, 2025

Third-person prompting seems very jailbreak-resistant

Related topics