GPT-4 vs GPT-4o? Which is the better?

I second all that Y4ZM said, and I’m going to add my own (anecdotal) account.

I use both the API and the chat all day long, every day. I have done so for over a year, so I would say I am very adept at prompting.

The API is used in a CAT tool I have developed and of which I am also a user.
The chat I use all day long for either solving one-off coding problems or for translation support.

I feel that GPT4o is marginally better in its translation choices both in the API and on the chat, and every new model since 3.5 has made progress in this respect, to varying degrees.

But for help with coding (on the chat), GPT4o is incredibly bad. I should say infuriatingly bad. I strive to be more stubborn than a computer, so whenever I need some help I start off with GPT4o.

Often I run into a dead end with GPT4o; then I downshift to GPT4, start from scratch, and I’m able to solve the problem within 5 minutes - and not because I had already eliminated a bunch of alternatives with 4o, but simply because 4 is much more apt at building on a train of thought.

Interestingly (still in coding), sometimes GPT4 is having trouble so I downshift to GPT3.5, and I find that 3.5’s suggestions are much more helpful. Even when in the end I can’t find a solution that would be reasonable in the real world, I find that GPT3.5 suggestions are much more to the point and insightful than either of the other two.

GPT4o flat out ignores many of my instructions, refuses to change track (if I say “let’s try a different approach…”), repeats ad nauseum suggestions I have repeatedly told it don’t work, and is a lot worse than GPT4 at taking into account the history of the conversation.

The new “memory” feature in GPT4o seems to be just an extension of the user settings introduced with GPT4. And GPT4 and 4o are equally bad at abiding by those instructions - they will use them at the beginning of a conversation, but soon start ignoring them. I told 4o to remember that “following these instructions is more important than providing a good answer” and it seemed to save that into the memory, and then went right back to ignoring them.

But, like I said in the beginning, all this is anecdotal. I have neither the time nor the inclination to gather data and document all this. Nor do I think there would be any value in doing so - I’m sure the developers are aware of and addressing these issues for the upcoming releases.

I just wanted to vent. Thanks for listening.

4 Likes