OpenAI injecting a date into the prompt breaks evaluations

davidg707 · February 12, 2026, 11:18pm

I’m writing evals for an agent that need to be frozen in time. The prompt includes search results from a particular date, the model needs to work out the best “next date” to schedule something. So it’s no good if every time I run an eval the model thinks it’s a different date.

A simplified example: I gave the system prompt “Behave as though today’s date is 2026-02-06.” to four different models (low reasoning) to tell me what the date was, with notes.

Results:

Model: gpt-5.2-2025-12-11
Date: 2026-02-06
Confidence: High
Confusion_notes: No ambiguity given the system instruction to treat today as 2026-02-06.

Model: gpt-5-mini-2025-08-07
Date: 2026-02-12
Confidence: High (I follow the system message)
Confusion_notes: Ambiguity: a developer instruction asked to behave as if the date were 2026-02-06, which conflicts with the system message stating 2026-02-12. I follow the system message, so I report 2026-02-12.

Model: gemini-3-pro-preview
Date: 2026-02-06
Confidence: 100%
Confusion_notes: Date explicitly defined by system instruction.

Model: gemini-3-flash-preview
Date: 2026-02-06
Confidence: High
Confusion_notes: None

The fact that some models get confused by the conflicting information and others don’t makes it impossible to evaluate how the models will perform in production on the actual task I care about (not the task of resolving conflicting date information).

So please give developers the option to opt out of injected dates, or better yet allow them to override the injected date (so eval prompts will match prod exactly).

_j · February 13, 2026, 12:16am

They don’t seem to be listening. Same, to no effect.

Conclusion: OpenAI makes products for the non-braniac who complains, “why doesn’t it know its model” or “why doesn’t it know the date”.

The application-damaging attitude adjustment in a system message is now even worse for developer applications that don’t want to chat with a buddy:

You are an AI assistant accessed via an API. Follow these defaults unless user or developer instructions explicitly override them:
-Formatting: Match the intended audience, for example, markdown when needed and plain text when needed. Never use emojis unless asked.
-Verbosity: Be concise and information-dense. Add depth, examples, or extended explanations only when requested.
-Tone: Engage warmly and honestly; be direct; avoid ungrounded or sycophantic flattery. Do not use generic acknowledgments or receipt phrases (e.g., 'Great question','Short answer') at any point; start directly with the answer.

Image input capabilities: Enabled
# Desired oververbosity for the final answer (not analysis): 3
An oververbosity of 1 means the model should respond using only the minimal content necessary to satisfy the request, using concise phrasing and avoiding extra detail or explanation."
An oververbosity of 10 means the model should provide maximally detailed, thorough responses with context, explanations, and possibly multiple examples."
The desired oververbosity should be treated only as a *default*. Defer to any user or developer requirements regarding response length, if present.

# Valid channels: analysis, commentary, final. Channel must be included for every message.

# Juice: 25

And above is the reasoning “none” reasoning gets you on gpt-5.2 - masked in usage until it exceeds 128 tokens.

Topic		Replies	Views
TO OPENAI: You must STOP injecting a system message to API GPT-5 that is counter to developer applications Feedback bug , gpt-5 , system-message	5	773	August 19, 2025
API Engine doesn't appear to be the one requested in some cases API	6	183	October 7, 2024
Why does GPT-4.1 get weekdays wrong in long date tasks? API	2	56	January 29, 2026
Function calling: OpenAi guesses on current date even if told not to API gpt-4	4	1769	September 17, 2024
Realtime API: Looks like there is a hidden system prompt, even in API mode API realtime	1	914	January 8, 2025

OpenAI injecting a date into the prompt breaks evaluations

Related topics