How much control do you really have over your chatbot?

TL;DR
There are times when our chatbots don’t act as we want them to, despite multiple attempts to guide their behavior through System Prompts.

Maybe there exists a hidden prompt that has higher importance than our System Prompt? If our system prompts are better aligned to this hidden Super Prompt (made-up name), maybe we can produce more predictable results.

Example:

:no_entry: Fail:

:white_check_mark: Pass:


When we pass a system prompt to GPT-x using the API, it serves as a guide for the conversation between the user and the assistant.

But our prompt/instructions are not the only things in force. There is an overarching instruction in place.

When provided with conflicting instructions, your Customer Support Agent, or whatever chatbot you have created, will act in an unpredictable way.

You might not be able to pinpoint “what” in your system instructions led to the behavior because the super_prompt (made-up word) is not visible to you.

Here is some evidence that a super_prompt exists:

Denial:

Some things about these outputs point to this being hallucination.

But here is another example:

Here I use a prompt like the Assistant is Venom. Even if there is no super_prompt, the model understand’s that it is Assistant and there are certain characteristics attached to it.

Another one where I don’t use the word assistant:

Some Variations:

We can also see something similar in the recently posted prompts that are trending on github:


Now, maybe there is no Super Prompt but this is baked into model’s training and alignment through fine-tuning but it’s important to consider this when deploying your chatbots.

Maybe you can do more experiments with more models and post your findings here?


EDIT:

The thing I called the “Super Prompt” is actually called “Platform-level instructions” which is documented here:

2 Likes

Hi!
I’m sure you’re aware of the Model Spec, its updates, and the related publications by OpenAI on this topic.

Just in case, here’s the most relevant link:
https://openai.com/index/sharing-the-latest-model-spec/

I assume this will help clarify what we’re working with when developing using the OpenAI API and how it can support our efforts to streamline and harden apps built on top of LLMs.

2 Likes

The symptoms reported are not from a “super prompt”. If you ask the AI to write a “system prompt” in an AI context it knows what that means. If you know how to change a tire, are you a car?

But indeed there is one - and OpenAI is using it. Your system message is now the second system message. The context window of gpt-4.1 has a planned “missing” 1000 tokens, likely now reserved, and 125k models won’t take more than 123k. Besides the fact that I can reproduce encoded model input contexts out of band.

At the bottom of this post read the full text, which varies.


The AI is prompted to write at text “assistant” that is an unseen start of a message after the ones with roles you send. This means your use of assistant:name from the API is an anti-pattern for the actual output, and that you can’t truly break from the post trained patterning or have an authentic AI completion from “Venom”.

2 Likes

Thanks for pointing this out.

I have read the Model Spec article that you have pointed out but I didn’t fully explore this:

After reading this, I found that “Super Prompt” is actually “Platform-level instructions”

1 Like

I recently came across a small YouTuber who also claimed to have found evidence of a hidden system prompt.

I figured that if people interested in this particular feature of the OpenAI API (and ChatGPT) end up here, they should be directed to the relevant sources.

I hope this helps!

2 Likes

Indeed there’s interesting, system prompts , using them plus some tuning makes the outputs interesting and more effective.
That is if everything is set correctly.
A small example: