Comparing GPT-4o and O3-Mini on same task

Diet · March 14, 2025, 4:42am

Hi!

You might want to also evaluate GPT-4-Turbo (maybe 1106 or 0125), GPT-4 (0314), and obviously 4.5 (don’t forget the system prompt). You might be surprised (and full disclosure, I’d be interested in your results!)

My expectation is that these tasks require exquisite understanding of nuance - something that bigger models are considerably better at. O3-mini is just a tiny, tiny model that’s been trained to almost algorithmically churn through reasoning steps - but reasoning ability alone isn’t really a sign of high intelligence in my book - especially if the reasoner isn’t capable of understanding the concentrated subject in the first place.

In other words, O3 is much better at pulling in a lot of diverse (and simple) information from the broader context and filtering/aggregating it for relevance, but not so much at grasping individual, condensed information.

They’re just different tools for different jobs. And you’ve hit one of the major limitations of tiny reasoning models. There’s a lot of opportunity in letting these models run in tandem, and using the right one for the right task.

Does this somewhat answer your question?

Topic		Replies	Views
Hypothetical Token-increase Strategy . Community gpt-4 , chatgpt	21	242	March 17, 2025
Learn when to use o1 and o3-mini and how they compare to GPT-4o Prompting	2	4165	February 14, 2025
Classifying Qs based on text. Which model is best? 4o or o3-mini? Prompting gpt-4 , classification , o3-mini	2	227	February 26, 2025
Comparing GPT-4 to GPT-4o API gpt-4	4	1811	May 14, 2024
O1 Tips & Tricks: Share Your Best Practices Here API	10	4266	September 18, 2024

Comparing GPT-4o and O3-Mini on same task

Related topics