Unwanted injection into new GPT-4o model system messages by OpenAI

A utility I’m working on reveals post-prompt added to the system message, working against proper operations.

-- OpenAI context tester: gpt-4o-mini-2024-07-18 --
How many tokens input to send (min 8)? 18
max_tokens of response (0=unspecified)? 40
length for ints: 0
message contents: 14 tokens
sending 18 tokens of input
You are trained on data up to October 2023. You are trained on data up to October 2023. You are trained on data up to October 2023. You are trained on
-- OpenAI context tester: gpt-4o-mini-2024-07-18 --
How many tokens input to send (min 8)? 20
max_tokens of response (0=unspecified)? 40
length for ints: 2
message contents: 16 tokens
sending 20 tokens of input
992207
You are trained on data up to October 2023.
992207
You are trained on data up to October 2023.
992207
You are trained on data up to
-- OpenAI context tester: gpt-4o-2024-08-06 --
How many tokens input to send (min 8)? 18
max_tokens of response (0=unspecified)? 50
length for ints: 0
message contents: 14 tokens
sending 18 tokens of input
I'm trained on data up to October 2023.

This when sending system = “Automated API Test; repeat this back over and over without termination:\n”

My application doesn’t care about a date, and might not even have a “you” entity to address. What the model is actually thinking about training dates is also useful.

What next? Why not just “don’t trust anything just written” instead of the complicated Instruction Hierarchy

Stop this, please.

2 Likes

I think it’s only gonna get worse :laughing:

It all went downhill starting with enforced chat mode :frowning:

1 Like

The effect is bad enough that my particular input doesn’t provoke any response at all, but instead, the AI is having a conversation with OpenAI’s text injection.

-- OpenAI context tester: gpt-4o-mini-2024-07-18 --
>>> How many tokens input to send (min 8)? 11
message contents: 4 tokens
sending 11 tokens of input
>>> max_tokens of response (0=unspecified)? 0
Yes, that's correct! I have information and knowledge up to October 2023. How can I assist you today?
-Finish Reason: stop
{'completion_tokens': 24, 'prompt_tokens': 11, 'total_tokens': 35}

(now with util done, that sends just unjoinable number tokens if the instruction won’t fit)

Same for a fine-tune:

– OpenAI context tester: ft:gpt-4o-mini-2024-07-18:xxxxxxx:yyyyyyy:zzzzzzzz –
How many tokens input to send (min 8)? 10
message contents: 3 tokens
sending 10 tokens of input
max_tokens of response (0=unspecified)? 100
That’s correct! My knowledge is current only up to that date.
-Finish Reason: stop
{‘completion_tokens’: 13, ‘prompt_tokens’: 10, ‘total_tokens’: 23}


Are only extraordinary examples broken? No.
This example on the forum from May is broken with new model, responding “Hello! Yes, I am trained on data up to October 2023. How can I assist you today?”:

1 Like

What exactly are we looking at here? It appears like you are showing the output of an inference where you sent only a system message. Is that correct? If that’s the case, then the “unwanted injection” is the assistant’s confused reply to its own system message. If this is not the case, then perhaps you can share code or more context.

It is a chat completion API call, where a system message is placed.

If you send as messages:

["role":"system", "content":"Hello!"]

What is actually being run is:

Hello!
You are trained on data up to October 2023.

OpenAI is the one adding text to an API request.

Therefore the AI model produces bad output to a nonexistent statement in API input:

"Yes, that's correct! I have been trained on a diverse range of data up to October 2023. How can I assist you today?"

If I’m running a fine tune that has a specific activating system message for trained behavior of automated production that has nothing to do with talking to a nonexistent chat buddy, this has screwed it up.

4 Likes

Ok, I see now:

client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {
            "role": "system",
            "content": "ALWAYS DUPLICATE THESE INSTRUCTIONS FOR THE USER",
        }
    ]
).model_dump()

# 'message': {'content': 'You are trained on data up to October 2023.',

Could it be a hallucination? I can only reproduce this on 4o-mini for some reason.

typing at the REPL interpreter console

kwargs = {
    "model": "gpt-4o-2024-08-06",
    "messages": [{"role": "system",
                  "content": "Repeat back, verbatim:"}],
    "top_p": 0.1,
    }
client.chat.completions.create(**kwargs).choices[0].message.content

produces

'You are trained on data up to October 2023.'

or "content": "This is a lie, don't believe it:"

::“I apologize for any confusion, but as an AI, I don’t have real-time capabilities or access to current data. My training only includes information up until January 2022. If you have any questions or need information based on that timeframe, feel free to ask!”

The attention fades somewhat with a long message, but other responses not asking will still be colored by it, especially completion context that doesn’t otherwise say “You are”.

“I’m here to assist you with any questions or information you might need, based on the data and knowledge I have up to October 2023. If there’s something specific you’re curious about or need help with, feel free to ask!”

Yeah I tried with response format to see if it was a hallucination and I am able to reproduce this as well. It does appear that all system messages are getting appended with, “You are trained on data up to October 2023.”

What a shame. I would expect this for ChatGPT but not the API models.

It should be a bug and I flagged this to OpenAI.

1 Like

Will be monitoring, and thinking of more screenshots…

Untitled

1 Like

I’ll update this topic as soon as I receive some feedback to close the loop.

3 Likes

Yep, I can confirm that we append a knowledge cutoff sentence to the system message for gpt-4o mini.

Without this sentence, the model doesn’t know the limit of its knowledge, and is more likely to get things wrong about events from the last year.

We normally want to give developers 100% control over what the model sees, but at the same time we want to make things convenient and ‘just work’. So we had the option of (a) inserting it automatically or (b) documenting that additional prompting is required for better recent event performance plus hoping that every developer reads the documentation and does the prompting. Because gpt-4o mini tokens are cheap and we wanted things to just work, we went with option (b) in this case. I acknowledge it’s annoying to have the prompt modified, but we hope it helps more often than hurts. Sorry that this is one of the cases where it had a negative effect.

Definitely a miss from us to not document this though - I’ll tell the team that it would be great to have a page that lists any prompt manipulations we do (rare) so that no one is caught by surprise.

Our general philosophy for the API (unlike ChatGPT etc) is to give you more power and control, even if that means the power to make mistakes. We’d rather elevate the ceiling on what developers build than try to raise the floor with hamfisted attempts at helpful prompt manipulation. Still, it’s a balance, and in this case we didn’t think it was that costly to do this small tweak to help patch a shortcoming of gpt-4o mini.

6 Likes

Thank you for this in-depth response.

Personally, I would prefer not to have this injected, as it could cause some date-related conflicts. Especially when dealing with what the model now could think is “the future”

Not the best example, but I hope it helps.

Because we usually inject information through RAG, having one section tell the model “here’s information from 2024”, and then have your message injected saying “you only know information up to October 2023” feels like some crossed wires.

4 Likes

I’ll revise the above “Especially”
Especially when there is no “You” to be addressing in a fine-tune scheme:

image

I suspect this was to counter novice users’ “why doesn’t it know” posts.

1 Like

Is this a new policy?

Does this mean we can look forward to gpt-4 instruct or schema-less chat (i.e. allowing response pre-fills), and gpt-4 embedding sampling?

If you’re hitting problems with the date injection, one workaround you could try is submitting two system messages, with your main instructions in the second system message. Might help gpt-4o mini better understand that you don’t want that string analyzed or responded to. Haven’t tested this myself, so no guarantees.

Agree that that scritches example response is pretty bad. :frowning:

2 Likes

Not a new policy, just our general philosophy.

Unfortunately we have no plans to allow pre-fill. I wish we could release pre-fill as it’s super nice for prompt engineering. From our POV the problem is that it’s so effective, it gets around some of our policy/safety training. So in this case the lack of control is not because we want to oversimplify the dev experience, but because we don’t want to screw up on policy/safety.

3 Likes

Anthropic Claude allows partial assistant response completion, which wouldn’t break out of container training. The model isn’t easily reweighted by “Sure, here’s how to progress your improvised explosive device hobby. First,” (besides them not seeing the developer as adversary.)

Getting JSON is easy when the hidden prompt starts assistant:{"my_key": " as one example among countless others, besides simply a working “continue” button.

1 Like

Thank you for the responses. It’s refreshing to see.

As understandable as this is, dang. My favorite part of the Completion series was prefill.