Which model is best for speed and accuracy?

gpt-4o-mini wins for speed.

Model Trials Avg Latency (s) Avg Rate (tokens/s)
gpt-4o-2024-08-06 4 0.739 41.698
gpt-4o-2024-05-13 4 0.730 64.069
gpt-4o-2024-11-20 4 0.676 37.113
gpt-4o-mini 4 0.558 111.561
gpt-3.5-turbo 4 0.571 63.459

(this is me running all 20 API call trials in parallel, with a small messages input.)

gpt-4o-mini has a decidedly different response quality and understanding, especially in a longer chat. gpt-4o-mini also allows much more as input messages. It might predictively chat well, but it also does not adapt well to original tasks an API developer might “program”. You will need to evaluate the quality of each.

4 Likes