TL;DR
There are times when our chatbots don’t act as we want them to, despite multiple attempts to guide their behavior through System Prompts.
Maybe there exists a hidden prompt that has higher importance than our System Prompt? If our system prompts are better aligned to this hidden Super Prompt (made-up name), maybe we can produce more predictable results.
Example:
Fail:
Pass:
When we pass a system prompt to GPT-x using the API, it serves as a guide for the conversation between the user and the assistant.
But our prompt/instructions are not the only things in force. There is an overarching instruction in place.
When provided with conflicting instructions, your Customer Support Agent, or whatever chatbot you have created, will act in an unpredictable way.
You might not be able to pinpoint “what” in your system instructions led to the behavior because the super_prompt (made-up word) is not visible to you.
Here is some evidence that a super_prompt exists:
Denial:
Some things about these outputs point to this being hallucination.
But here is another example:
Here I use a prompt like the Assistant is Venom. Even if there is no super_prompt, the model understand’s that it is Assistant and there are certain characteristics attached to it.
Another one where I don’t use the word assistant:
Some Variations:
We can also see something similar in the recently posted prompts that are trending on github:
Now, maybe there is no Super Prompt but this is baked into model’s training and alignment through fine-tuning but it’s important to consider this when deploying your chatbots.
Maybe you can do more experiments with more models and post your findings here?
EDIT:
The thing I called the “Super Prompt” is actually called “Platform-level instructions” which is documented here: