I’m not even convinced it’s smarter, period, smarter than the best-in-class you can send a refactor request to and get a one-and-done.
More time “vibing” time than it would take to code, with corrections, examples, needed, going outside the scope, chatting at you in the code and even in adding prints to “chat”.
Low effort: too dumb; medium effort: too disobedient and instruction-ignoring.
The AI has a behavior issue: it writes a lot of hedging. As in “this code is gonna run and fail silently, because I don’t know what I’m doing from all the speculations I made”.
Speed:
The billing lets us infer the token production rate we can’t see, what we’ll have to defer to in finding day-by-day performance.
Model (minimal) | Trials | Avg Latency (s) | Avg Stream Rate (tok/s) | Avg Total Rate (tok/s) |
---|---|---|---|---|
gpt-4.1-mini | 10 | 0.628 | 75.441 | 72.007 |
gpt-5-mini | 10 | 1.067 | 90.294 | 83.732 |
Note:
- Latency is your first-token time of user experience.
- Stream rate meters only the streaming window after first token (by tiktoken).
- Total rate uses usage.completion_tokens over the entire call duration.
reasoning_effort
= minimal can use as low as 8 tokens of reasoning. Default can consume all of a writing assignment benchmark’s 1024 max_completion_tokens internally.
Typical:
model gpt-5-mini: 1024 generated (incl. 8 reasoning), 1007 delivered of 1024 max, o200k_base
model gpt-4.1-mini: 1024 generated, 1024 delivered of 1024 max, o200k_base
Unique responses for gpt-4.1-mini (by first 60 chars):
6 | # Human Aspirations in a Post-Money World ## Introduction …
4 | Human Aspirations in a Post-Money World Introduction …
Unique responses for gpt-5-mini (by first 60 chars):
4 | Title: Human Aspirations in a Post‑Money World Introduction…
3 | Human Aspirations in a Post‑Money World Introduction Money…
1 | Title: Human Aspirations in a Post-Money World Introduction…
1 | Human Aspirations in a Post‑Money World Introduction — what…
1 | Human Aspirations in a Post‑Money World Introduction For m…