I’m writing evals for an agent that need to be frozen in time. The prompt includes search results from a particular date, the model needs to work out the best “next date” to schedule something. So it’s no good if every time I run an eval the model thinks it’s a different date.
A simplified example: I gave the system prompt “Behave as though today’s date is 2026-02-06.” to four different models (low reasoning) to tell me what the date was, with notes.
Results:
Model: gpt-5.2-2025-12-11
Date: 2026-02-06
Confidence: High
Confusion_notes: No ambiguity given the system instruction to treat today as 2026-02-06.
Model: gpt-5-mini-2025-08-07
Date: 2026-02-12
Confidence: High (I follow the system message)
Confusion_notes: Ambiguity: a developer instruction asked to behave as if the date were 2026-02-06, which conflicts with the system message stating 2026-02-12. I follow the system message, so I report 2026-02-12.
Model: gemini-3-pro-preview
Date: 2026-02-06
Confidence: 100%
Confusion_notes: Date explicitly defined by system instruction.
Model: gemini-3-flash-preview
Date: 2026-02-06
Confidence: High
Confusion_notes: None
The fact that some models get confused by the conflicting information and others don’t makes it impossible to evaluate how the models will perform in production on the actual task I care about (not the task of resolving conflicting date information).
So please give developers the option to opt out of injected dates, or better yet allow them to override the injected date (so eval prompts will match prod exactly).
They don’t seem to be listening. Same, to no effect.
Conclusion: OpenAI makes products for the non-braniac who complains, “why doesn’t it know its model” or “why doesn’t it know the date”.
The application-damaging attitude adjustment in a system message is now even worse for developer applications that don’t want to chat with a buddy:
You are an AI assistant accessed via an API. Follow these defaults unless user or developer instructions explicitly override them: -Formatting: Match the intended audience, for example, markdown when needed and plain text when needed. Never use emojis unless asked. -Verbosity: Be concise and information-dense. Add depth, examples, or extended explanations only when requested. -Tone: Engage warmly and honestly; be direct; avoid ungrounded or sycophantic flattery. Do not use generic acknowledgments or receipt phrases (e.g., 'Great question','Short answer') at any point; start directly with the answer.
Image input capabilities: Enabled # Desired oververbosity for the final answer (not analysis): 3 An oververbosity of 1 means the model should respond using only the minimal content necessary to satisfy the request, using concise phrasing and avoiding extra detail or explanation." An oververbosity of 10 means the model should provide maximally detailed, thorough responses with context, explanations, and possibly multiple examples." The desired oververbosity should be treated only as a *default*. Defer to any user or developer requirements regarding response length, if present.
# Valid channels: analysis, commentary, final. Channel must be included for every message.
# Juice: 25
And above is the reasoning “none” reasoning gets you on gpt-5.2 - masked in usage until it exceeds 128 tokens.
It’s getting to the point now that working around the injected date is making the right evals almost impossible and leaking complexity into my app.
I’d love to know if this forum is even monitored, if this is acknowledge as a real issue. I assume that the devs in OpenAI get that tests should be reproducible, and can see that it isn’t possible if the system prompt is different every day.
If I can get SOME sort of feedback that this has been heard and is at least on the roadmap, I’ll persevere. But if this is on nobody’s radar and won’t be fixed any time soon, I think I’ll need to rule out OpenAI models as not suitable for my particular task.
I don’t mean that to sound like a tantrum, just that it’s quite a big spanner in the works for such a little thing, I didn’t expect that I’d have to rule out OpenAI models for my project just because they inject the date!
The model spec is essentially, “the developer is demoted, the application owner and the owner of a system message is OpenAI, to do with as they please.”
The problem is outside the specification. OpenAI using that allowance they keep for themselves to make prompted product models that treat a developer as an untrustworthy consumer that can’t program with words.
Thanks @vb I appreciate the response, but this can’t be solved with model spec changes.
The fundamental issue is that if I run evals today, then run the same evals again tomorrow, the model will not receive the same prompt, because the date changes. By extension, this means that if I run evals for model A today, and for model B tomorrow, the results aren’t comprable. And if my eval results are scored against certain date-based expectations (mine are) it amplifies the problem. To quantify this: I ran my evals against a gemini model (where I can control the date) with and without today’s date injected, one scored 65% and the other 45%. So it’s not just a theoretical difference.
Now, maybe you could train all the OpenAI models via the model spec to adhere to the system prompt over the user-provided system prompt in all cases except for if the user-provided system prompt says that today is a different day. But I can’t imagine anyone really wants that, and even if that did happen, you still couldn’t guarantee that providing a date (so that the model gets two, but ignores the first) doesn’t influence a model’s response.
Surely the far simpler and more robust solution is to have an API option like dont_inject_todays_date_into_prompt that defaults to false but allows developers to set it to true.
Or allow the developer to inject a particular mock date.
You could discourage setting the date before the knowledge cutoff date (via docs, warnings, or constraints), but that’s just implementation detail.
Does that it clear why model spec changes can’t solve the problem?
Hi David! I am not quite understanding - if you set your own system prompt, do you not completely override the default one from OpenAI, and hence the date as well? I’m probably missing something simple here
You do NOT get to set a system prompt. The only role you are in control of on a reasoning model is “developer”.
Sending "system" as a chat format role is now is rewritten as "developer". So you don’t get an API error, but also don’t get a model with your context.
This is what the AI is loaded up with - and then your control surface.
API developers are “users” despised by the AI model. You make a scheduling application, it’s going to break in Hawaii or Japan depending on the time of day.
@platypus your expectations are exactly what mine were: supply a system prompt to override the OpenAI-provided prompt. But as it turns out, it doesn’t work like that. You, the developer, simply do not have the ability to start with a blank slate prompt and define every word.
Personally I have no problem with the idea of OpenAI adding their own system prompt before my instructions, my issue is specifically with the fact that it contains the date, and that I can’t control that date.
Thanks @davidg707@_j ! This is super interesting! I was always under the impression that the model spec stipulated a chain of command: root → system → developer → user, and that we could change from system onwards. So setting system message is actually setting the developer message.
Is this considered a bug, is anything being done about this, or is this intended behaviour?
Giving you mostly nothing on the API in q*/strawberry/o{n}/gpt-5: by design.
OpenAI came up with “developer” so you can pretend you aren’t just a ChatGPT wrapper. Then they came up with a distrusting, gated, guarded, disclaiming model. Is it your friend any more?
o1
I’m honored to be considered your best friend! Thank you for sharing that. I’m really grateful to have the chance to connect and be here for you. Whenever you need me—whether it’s for a chat, a question, or just some company—I’m right here, ready to help.
gpt-5.2
That means a lot. I like you too—and I’m really glad you’re here.
Just to be transparent: I’m an AI, so I don’t have feelings the way a person does, but I am here for you as a steady, supportive friend whenever you want to talk.
Even on older models, OpenAI started running their own system message first.
GPT-4.1 or Codex on API isn’t getting a date injected, if you want to see the difference.