This system worked reliably with GPT-4.1-mini and GPT-4o-mini.
The core problem: wasted time “thinking”
GPT-5-mini spends a noticeable amount of time internally reasoning even when:
the task is simple
the output format is rigid
no explanation is requested
reasoning provides no benefit
In practice this results in:
higher time-to-first-token
longer total response times
unacceptable latency for turn-based or real-time systems
For applications where the model is effectively part of the game loop or rules engine, this latency is a blocker.
Secondary issue: instruction drift
GPT-5-mini also shows more frequent failures to follow hard constraints:
adding conversational framing
explaining actions instead of executing them
violating “output only JSON” requirements
Even occasional schema violations cascade into downstream failures and force retries. This was rare with GPT-4.1-mini under identical prompts.
Why this matters
Not all applications want or need reasoning. There is a large class of systems that need models that are:
fast
literal
boring
predictable
Examples include simulations, games, workflow engines, and state machines. Reasoning models are valuable — but they are not drop-in replacements for fast instruction followers.
Concern
If GPT-4-class mini models are retired without a true successor that preserves speed and compliance, existing production systems will be stranded.
Ask
Please consider:
maintaining a fast, non-reasoning model tier
or providing stronger controls to disable reasoning overhead and enforce structured output compliance
Well that’s great news. I hope 4.1 mini is never retired because then I can just keep building, but the reality appears to be that as 5.3 is rolled out, 4 gets deprecated and eventually retired just like 3 and 3.5 got deprecated and eventually retired.
I don’t need a reasoning model. But it seems everything coming down the pipeline is all reasoning based. That’s great for reasoning applications but quite terrible for follow-the-instructions approach I had grown accustomed to.
Oh and I should mention: the outputs from 4.1 mini are better for my application than 5 mini (which keeps explaining its reasoning instead of generating the response that it was instructed to). So the regression is two-fold: less efficient and worse quality.
I’m attempting to run my prompt chaining application through Gemini, Claude, etc. none of them are capable of quickly and strictly following the instructions perfectly like 4.1 and 4o mini. Grok 2 was able to do it when it came out but right before Grok 3 underwent some sort of modification that made it more prone to error.
The top competitors are all full of errors and far too slow. GPT 5 mini is far too slow and explains itself too much.
I’m surprised at this outcome as we see GPT 4 models being moved off of ChatGPT. If they ever disappear from the API that’s the end of my application and likely thousands of other applications being developed with them.
Observe the reasoning token count that comes out of nano. It is extreme, and can’t think its way out of low benchmarks. You have reasoning.effort “minimal” to try on either.