All these AI’s kind of know how benchmarks are tested and they can make some adjustment to pass test and top leaderboards. Real life scenarios are different from basic test scenarios. I see no other reason for such poor performance in reality vs test.
Also, i think these models are more like flash models and not full fledged models. Gemini pro vs flash models perform way different. Even GPT 4 vs 4o performs different . These models are cost effective not performance effective.
Yup totally agree, and it kind of worrying that they now consider GPT4 a legacy model… I always switch to GPT4 for any code related works. Gpt 4o seems to be the result of their tests a few months ago where GPT4 suddenly seemed to decrease in performance.
Gave the same script to GPT 4 “Legacy” and GPT 4o.
GPT 4o code did not work directly, GPT 4 “legacy” code worked right off the bat. (copied the exact same prompt)
Cancelled my sub and will definitely look for alternatives.
My gut feeling:
GPT4 was good, but too expensive to run, so they tried to “optimize” (aka dumb down) to save cost.
The new version of GPT-4o as of Aug 20 is ten times better now. It’s more eager to understand issues and resolve them through discussion rather than just running off producing code! Give it a try I’m impressed!
Its absolutely useless, open AI needs a good kicking for ruining it.
The funny thing is, GPT-4 was quite good, then eventually it become useless where I spent more time swearing at it then getting anything done, wasting hours and hours of my time.
Then we got GPT-4o and I thought all my problems were solved, well they were for a short time, now the same pattern occured, the only thing it has gotten better as is taking the swearing I’m giving it.
So now all I can do is revert to using legacy 4.
Somebody else has got to provde a better solution for coding, too much experimentation going on is making open AI unreliable.