This week's launches: o3, o4-mini, GPT-4.1, and Codex CLI

Variance in 1st token time and total 256 token response time across 100 trials.

Note that each graph is independently scaled, and scaling to 12 bins with non-zero values.

gpt-4.1-mini time to first token

gpt-4.1-nano time to first token

gpt-4.1-mini total time to 256 tokens

gpt-4.1-nano total time to 256 tokens

No retries tolerated, all success. Cache-breaking patterns with 1800 tokens in. Launch rate: 600 RPM

Metric Average Minimum Maximum
mini total TPS 94.1 26.8 128.6
nano total TPS 153.1 26.1 242.7

nano: 256 tokens in under two seconds in over 60%.

1 Like