I’m a bit puzzled how OpenAI rates the inference speed of their models and how the API speed compares to something like Gemini. I ran a quick speed test and these were the results:
$ uvx tacho gpt-4o gpt-4o-mini o4-mini o3 gpt-4.1-nano gpt-4.1-mini gpt-4.1
┏━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━━┓
┃ Model ┃ Avg t/s ┃ Min t/s ┃ Max t/s ┃ Time ┃ Tokens ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━━┩
│ o4-mini │ 179.4 │ 165.8 │ 190.9 │ 5.6s │ 1000 │
│ o3 │ 115.4 │ 96.6 │ 134.6 │ 8.8s │ 1000 │
│ gpt-4.1-nano │ 95.9 │ 75.7 │ 106.6 │ 5.3s │ 500 │
│ gpt-4.1 │ 67.6 │ 56.9 │ 80.7 │ 7.5s │ 500 │
│ gpt-4.1-mini │ 61.7 │ 51.5 │ 68.4 │ 8.2s │ 500 │
│ gpt-4o-mini │ 59.0 │ 46.1 │ 69.9 │ 8.7s │ 500 │
│ gpt-4o │ 31.7 │ 29.1 │ 35.9 │ 15.9s │ 500 │
└──────────────┴─────────┴─────────┴─────────┴───────┴────────┘
For example, gpt-4.1-nano which has the highest speed rating of all models (5 stars) is 20% slower than o3 which is rated as the slowest model with 1 star. Also, gpt-4.1 which is rated at the same speed as gpt-4o is actually twice as fast.
Compared to Gemini models, the OpenAI API is also quite slow:
$ uvx tacho gemini/gemini-2.5-flash gemini/gemini-2.5-pro gemini/gemini-2.5-flash-lite-preview-06-17 openai/gpt-4.1-mini openai/gpt-4.1 openai/gpt-4.1-nano
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━━┓
┃ Model ┃ Avg t/s ┃ Min t/s ┃ Max t/s ┃ Time ┃ Tokens ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━━┩
│ gemini/gemini-2.5-flash-lite-preview-06-17 │ 291.0 │ 258.1 │ 326.7 │ 1.7s │ 500 │
│ gemini/gemini-2.5-flash │ 281.4 │ 271.5 │ 287.3 │ 3.5s │ 998 │
│ gemini/gemini-2.5-pro │ 145.7 │ 137.0 │ 155.5 │ 6.9s │ 998 │
│ openai/gpt-4.1-nano │ 86.0 │ 58.2 │ 97.8 │ 6.0s │ 500 │
│ openai/gpt-4.1-mini │ 57.6 │ 49.9 │ 66.0 │ 8.8s │ 500 │
│ openai/gpt-4.1 │ 39.7 │ 25.3 │ 55.1 │ 13.8s │ 500 │
└────────────────────────────────────────────┴─────────┴─────────┴─────────┴───────┴────────┘
Is there any chance this will improve in the near future?