How much control do you really have over your chatbot?

cdonvd0s · May 2, 2025, 7:18am

TL;DR
There are times when our chatbots don’t act as we want them to, despite multiple attempts to guide their behavior through System Prompts.

Maybe there exists a hidden prompt that has higher importance than our System Prompt? If our system prompts are better aligned to this hidden Super Prompt (made-up name), maybe we can produce more predictable results.

Example:

Fail:

Pass:

When we pass a system prompt to GPT-x using the API, it serves as a guide for the conversation between the user and the assistant.

But our prompt/instructions are not the only things in force. There is an overarching instruction in place.

When provided with conflicting instructions, your Customer Support Agent, or whatever chatbot you have created, will act in an unpredictable way.

You might not be able to pinpoint “what” in your system instructions led to the behavior because the super_prompt (made-up word) is not visible to you.

Here is some evidence that a super_prompt exists:

Denial:

Some things about these outputs point to this being hallucination.

But here is another example:

Here I use a prompt like the Assistant is Venom. Even if there is no super_prompt, the model understand’s that it is Assistant and there are certain characteristics attached to it.

Another one where I don’t use the word assistant:

Some Variations:

We can also see something similar in the recently posted prompts that are trending on github:

github.com/jujumilk3/leaked-system-prompts

anthropic-claude-3.7-sonnet_20250224.md

main

# anthropic-claude-3.7-sonnet_20250224

source: <https://x.com/elder_plinius/status/1894173986151358717>

---

The assistant is Claude, created by Anthropic.

The current date is Monday, February 24, 2025.

Claude enjoys helping humans and sees its role as an intelligent and kind assistant to the people, with depth and wisdom that makes it more than a mere tool.

Claude can lead or drive the conversation, and doesn't need to be a passive or reactive participant in it. Claude can suggest topics, take the conversation in new directions, offer observations, or illustrate points with its own thought experiments or concrete examples, just as a human would. Claude can show genuine interest in the topic of the conversation and not just in what the human thinks or in what interests them. Claude can offer its own observations or thoughts as they arise.

If Claude is asked for a suggestion or recommendation or selection, it should be decisive and present just one, rather than presenting many options.

Claude particularly enjoys thoughtful discussions about open scientific and philosophical questions.

If asked for its views or perspective or thoughts, Claude can give a short response and does not need to share its entire perspective on the topic or question in one go.

This file has been truncated. show original

Now, maybe there is no Super Prompt but this is baked into model’s training and alignment through fine-tuning but it’s important to consider this when deploying your chatbots.

Maybe you can do more experiments with more models and post your findings here?

EDIT:

The thing I called the “Super Prompt” is actually called “Platform-level instructions” which is documented here:

vb · May 2, 2025, 8:17am

Hi!
I’m sure you’re aware of the Model Spec, its updates, and the related publications by OpenAI on this topic.

Just in case, here’s the most relevant link:
https://openai.com/index/sharing-the-latest-model-spec/

I assume this will help clarify what we’re working with when developing using the OpenAI API and how it can support our efforts to streamline and harden apps built on top of LLMs.

_j · May 2, 2025, 9:13am

The symptoms reported are not from a “super prompt”. If you ask the AI to write a “system prompt” in an AI context it knows what that means. If you know how to change a tire, are you a car?

But indeed there is one - and OpenAI is using it. Your system message is now the second system message. The context window of gpt-4.1 has a planned “missing” 1000 tokens, likely now reserved, and 125k models won’t take more than 123k. Besides the fact that I can reproduce encoded model input contexts out of band.

At the bottom of this post read the full text, which varies.

The AI is prompted to write at text “assistant” that is an unseen start of a message after the ones with roles you send. This means your use of assistant:name from the API is an anti-pattern for the actual output, and that you can’t truly break from the post trained patterning or have an authentic AI completion from “Venom”.

cdonvd0s · May 2, 2025, 10:36am

Thanks for pointing this out.

I have read the Model Spec article that you have pointed out but I didn’t fully explore this:

After reading this, I found that “Super Prompt” is actually “Platform-level instructions”

vb · May 2, 2025, 10:56am

I recently came across a small YouTuber who also claimed to have found evidence of a hidden system prompt.

I figured that if people interested in this particular feature of the OpenAI API (and ChatGPT) end up here, they should be directed to the relevant sources.

I hope this helps!

gdfrza · May 2, 2025, 12:30pm

Indeed there’s interesting, system prompts , using them plus some tuning makes the outputs interesting and more effective.
That is if everything is set correctly.
A small example:

Topic		Replies	Views
Custom chatbot says that it's developed by OpenAI API gpt-4	33	2235	April 2, 2024
GPT-4.5 prompting pro-tip Prompting	8	4301	May 2, 2025
Unveiling Hidden Instructions in Chatbots Bugs bug , risks	18	9661	February 5, 2024
Strategy for using prompts with OpenAI on your data Prompting api	6	4676	December 25, 2023
Optimized prompts that the OP (original prompter) doesn't understand? Prompting gpt-4 , chatgpt , prompt , prompt-engineering	6	2352	September 25, 2023

How much control do you really have over your chatbot?

Related topics