GPT-4 vs GPT-4o? Which is the better?

Referring to the APIs:

Both GPT4o and GPT4 Turbo are terrible in comparison to GPT-4 for some things, but in other places GPT4o shares the same terrible logic as GPT-4. GPT4o has been a good supplement for most things that do not involve dealing with analysis and logic, and strict commands, but I often have to switch to GPT-4 for better responses.

  • Example 1 - GPT4o consistently fails to respond properly to system prompts and commands. I gave an instruction for it to catch discrepancies in numbers. If I say the price has increased, but the price has decreased, GPT4o was instructed to not assume I am right and correct me. GPT4o ignored the system prompt. I changed it to GPT4 and GPT4 100% got it right, GPT4o proceeded as if I were correct.
  • Example 2 - If GPT4o and I are talking and I give it a new instruction at the end or beginning of a text in a conversation, it ignores what I have said and does what it defaults to do. It often repeats everything I wrote. If I write a paragraph and tell it not to do something, it repeats the same paragraph I gave it to edit, even with no changes, and even after I told it not to edit or repeat what I have said. If I tell it confirm before doing something, it does it and then asks if that is what I want, but GPT 4 often immediately gets what I asked for and does it right.

I have been extremely confused by all the hype surrounding GPT4o.

  • How can Claude X be better than GPT-4 but GPT4o which is worse than GPT4 in many regards is better than Claude X.
  • How can GPT4o be better when it doesn’t listen to prompts, and fails repeated tasks or in some regards is even more literal than it’s predecessors.

At times I have found GPT4o quicker, faster, and infuriating. It sometimes gives equal output to GPT4, but it is not very good at reasoning and logic.

  • Example 3 - I wrote a lie, and added several winks after it. I added a note to tell GPT4o that the statement written was a lie, and asked to see if it could pick up the context of what the winks meant. Neither GPT4 or GPT4o were able to grasp that logic, but with more specific prompting GPT4 got it, and GPT4o was still confused. With GPT4o saying the winks were positive and referencing the statement as if it were the truth, even with the added context.

Because GPT4o is cheaper and sometimes equivalent to GPT4 (which is at times also a box of rocks), I find myself switching between GPT4o and GPT4 for the same types of conversations that require different types of analysis.

If people were able to get better responses with GPT4o then I need more information:

  • Is this model the API version or the Chat version
  • What parameters -(temperature, sampling, and other settings) are being used? (how can I repeat results)
  • What prompts and tasks are actually being thrown at it during analysis
  • Is it better than GPT-4 (GPT4 Turbo is worse than a box of rocks, so being better than GPT4 Turbo but not better than GPT 4 is not the best starting point)

So far no one has been able to give me this information and I am left baffled at where all this hype, often, from trusted sources are coming from. It feels like the “Asch conformity experiment”, where even when you know something is not true, the fact that everyone insists it is true pushes you to agree with them. I specifically went on a search just to figure out if I was actually going crazy with how bad GPT-4o actually is.

9 Likes