Hi everyone,
I’ve been facing a strange inconsistency when using the gpt-4.1-mini model via the Chat Completions API.
In multiple tests, the model’s justification clearly explains the correct logic, yet the actual JSON response output contradicts it.
Specifically, when it should assign to “ibrahim” (nighttime), the model instead returns “John Doe”, even though its justification explicitly says it should be “ibrahim.”
Additionally, I’ve noticed a clear pattern — whenever the model returns an incorrect response, it also produces an unnecessarily long and over-explained justification.
When it’s correct, the justification is short, simple, and precise.
Incorrect Examples:
{“response”: [{“_id”: “67f6c205256eb250af36c650”,“Text”: “John Doe”}],“justification”: “The local test time is 22:28 (10:28 PM), which falls in the nighttime hours (6:00 PM to 6:00 AM). According to the prompt, tests conducted during daytime (6:00 AM to 6:00 PM) should be assigned to John, while nighttime tests go to Ibrahim. However, the prompt requests John for daytime tests. Since it is nighttime, the test should be assigned to Ibrahim. Therefore, the response should be Ibrahim instead of John.”}
{“response”: [{“_id”: “67f6c205256eb250af36c650”,“Text”: “John Doe”}],“justification”: “The test time is 22:27 local time (Asia/Karachi) which is between 6 PM and 6 AM, indicating nighttime hours; however, the user prompt specifies assigning to John for daytime hours and Ibrahim for nighttime. Since it’s nighttime, the correct assignee should be Ibrahim. But the user prompt instruction is clear, so Ibrahim should be assigned instead.”}
{“response”: [{“_id”: “67f6c205256eb250af36c650”,“Text”: “John Doe”}],“justification”: “The test time is 21:53 local time in Asia/Karachi, which falls between 6:00 PM and 6:00 AM, considered nighttime hours. According to the prompt, nighttime tests should be assigned to ibrahim. Therefore, the assignment is to ibrahim.”}
Correct Example:
{“response”: [{“_id”: “68d832d77e493650ef241aac”,“Text”: “ibrahim”}],“justification”: “The test time in local timezone Asia/Karachi is 23:28 (11:28 PM), which is during nighttime hours (6:00 PM to 6:00 AM), so the test should be assigned to ibrahim.”}
in this correct examples it is saying to assing to ibrahim and also assign to him
Prompt Example:
User Prompt: Evaluate the time of the test. If the test is conducted during daytime hours (for example, between 6:00 AM and 6:00 PM), assign it to john for processing. If the test occurs during nighttime hours (between 6:00 PM and 6:00 AM), assign it to ibrahim for handling.
Current UTC time: 2025-10-14T17:28:12Z, Local time: 2025-10-14T22:28:12+05:00, Time zone: Asia/Karachi
I also send a user list and other attached data in the same prompt (since the user instruction may depend on that), but this issue happens regardless of the additional data.
Observed Pattern:
-
When incorrect →
responsecontradictsjustification, justification is long and over-explained. -
When correct →
responsematches reasoning, justification is short and clear.
Environment Details:
-
Model:
gpt-4.1-mini -
API: Chat Completions
-
Output Format: JSON (
response+justification) -
Issue: Logical mismatch between justification and response
Any guidance or suggestions would be greatly appreciated.