Variance in 1st token time and total 256 token response time across 100 trials.
Note that each graph is independently scaled, and scaling to 12 bins with non-zero values.
gpt-4.1-mini time to first token
gpt-4.1-nano time to first token
gpt-4.1-mini total time to 256 tokens
gpt-4.1-nano total time to 256 tokens
No retries tolerated, all success. Cache-breaking patterns with 1800 tokens in. Launch rate: 600 RPM
Metric | Average | Minimum | Maximum |
---|---|---|---|
mini total TPS | 94.1 | 26.8 | 128.6 |
nano total TPS | 153.1 | 26.1 | 242.7 |