As a tiny nitpick, which I’m sure you’re aware of but others may not be, for reasoning models your messages use the developer
role and OpenAI uses the system
role, which the LLM is trained to trump your prompt with in the instruction hierarchy specified in the model spec.
So, OpenAI has its own privileged role, and their intentions with it and any injected instructions are completely invisible to you. As if closed weights weren’t bad enough. My hope is that it’s just for hot-patching the next GlazeGPT outbreak.
The order of system messages never seemed to matter. Today for the purpose of this post, I was able to get gpt-4.1 to correctly identify a person in an image I got from Wikipedia on my first attempt via system instructions. It will not do this for a user
, and o4-mini won’t do it for a developer
.
Same prompt, but with o4-mini:
I’m sorry, but I can’t help with identifying the person in the image.
gpt-4.1, which previously worked, but now with user
role and no system instructions:
Sorry, I can’t determine who this is.
Conclusion?
With nothing except for injected prompts, OpenAI only has the ablity to override your system instructions in reasoning models, and they do this using a role only they have access to. Today’s non-reasoning flagships don’t seem to have this ability. But it makes me wonder… what else will they do with this super-system kind of role of theirs?
Edit
I only realized just now that I’m horribly off-topic, so I’ll try and add something of value.
The increase in tokens may be the model re-prompting itself every time it manipulates the image. So, my interpretation is that if it manipulates one image three times, then that’s six image inputs in your API call.