Which model is best for speed and accuracy?

nasershanti654 · February 25, 2025, 9:25am

I’m building an AI agent chatbot using OpenAI’s API. Which model would be the best choice for speed and accuracy: gpt-4o-mini-2024-07-18 or gpt-3.5-turbo?

_j · February 25, 2025, 10:08am

gpt-4o-mini wins for speed.

Model	Trials	Avg Latency (s)	Avg Rate (tokens/s)
gpt-4o-2024-08-06	4	0.739	41.698
gpt-4o-2024-05-13	4	0.730	64.069
gpt-4o-2024-11-20	4	0.676	37.113
gpt-4o-mini	4	0.558	111.561
gpt-3.5-turbo	4	0.571	63.459

(this is me running all 20 API call trials in parallel, with a small messages input.)

gpt-4o-mini has a decidedly different response quality and understanding, especially in a longer chat. gpt-4o-mini also allows much more as input messages. It might predictively chat well, but it also does not adapt well to original tasks an API developer might “program”. You will need to evaluate the quality of each.

nasershanti654 · February 25, 2025, 10:23am

does the last version of GPT-4o will be the last stable version - i mean gpt-4o-2024-08-06?

nasershanti654 · February 25, 2025, 10:40am

one more thing i also used o3-mini but i noticed that its very slow. why?

_j · February 25, 2025, 6:57pm

If you simply specify “gpt-4o”, you will currently get gpt-4o-2024-08-06. This is a “recommended model” pointer.

The “O” series of AI models, such as o3-mini are reasoning models. They generate internal planning and thoughts that you do not see, at higher expense. Think of them as puzzle and problem solvers instead of conversationalists. It also takes a reasoning_effort parameter so you can tune how much thinking and dedication is allotted before responding.

https://platform.openai.com/docs/models#o3-mini

GoldenJoe · February 25, 2025, 7:15pm

I assume this is a polite way of saying “comparatively bad”, but I’d like to hear more about your specific experiences - where it falls short, and how you believe it is best used. I’ve spent extensive time with it as well, and I’ve been wondering recently, if those shortcomings in comprehension are useful as guidelines to better design our prompts, tools, etc.

For example, I recently had a problem which was easily solved by using 4o rather than 4o-mini, but after some experimentation, I found a few things which were confusing 4o-mini, and got it working with the smaller model. The question is, and as far as I know there’s no way to test this, whether doing so also improves comprehension in the larger models.

_j · February 25, 2025, 8:14pm

“mini” models, any AI model that has a lower parameter count, simply doesn’t have as much embeddings space to encode layers of pretrained knowledge. GPT-4 is going to do a better job reciting truthful statistics about the 1922 World Series team lineups, Amiga game developers who worked at a company, or even to write natively in ᐃᓄᒃᑎᑐᑦ than the predictions that come out of more compressed models.

GoldenJoe · February 25, 2025, 11:00pm

Sure, of course. I don’t think anybody is relying on mini models for knowledge. They’re primarily used for RAG and function calling. But even for simple application, I find 4o-mini to be very fragile and demanding of “technique” to get it to work in the place of 4o. I’ve seen nearly no discourse on it, which is why I’m curious if you have anything interesting cases you’d like to share.

Scarletioshub · February 26, 2025, 7:16am

For a balance of speed and accuracy, gpt-4o-mini-2024-07-18 is likely the better choice, as it’s optimized for efficiency while maintaining strong performance. However, if cost is a major factor, gpt-3.5-turbo is a solid alternative. Testing both on your use case is recommended.

Topic		Replies	Views
Dev Guide: when to use GPT-4o vs Turbo? API gpt-4-turbo , gpt-4o	11	14046	June 11, 2024
What is the difference between gpt-40-mini and gpt-4o model? API chatgpt , api	3	3259	February 5, 2025
What is best model in openai? Community gpt-4	3	6186	May 10, 2024
GPT 3.5 is faster and better than GPT4o mini Feedback	1	2059	August 21, 2024
Which model is faster gpt-3.5-turbo-1106 OR gpt-4-preview-1106? API assistants-api	4	10972	April 9, 2024

Which model is best for speed and accuracy?

Related topics