gpt-4o-mini wins for speed.
Model | Trials | Avg Latency (s) | Avg Rate (tokens/s) |
---|---|---|---|
gpt-4o-2024-08-06 | 4 | 0.739 | 41.698 |
gpt-4o-2024-05-13 | 4 | 0.730 | 64.069 |
gpt-4o-2024-11-20 | 4 | 0.676 | 37.113 |
gpt-4o-mini | 4 | 0.558 | 111.561 |
gpt-3.5-turbo | 4 | 0.571 | 63.459 |
(this is me running all 20 API call trials in parallel, with a small messages input.)
gpt-4o-mini has a decidedly different response quality and understanding, especially in a longer chat. gpt-4o-mini also allows much more as input messages. It might predictively chat well, but it also does not adapt well to original tasks an API developer might “program”. You will need to evaluate the quality of each.