All these AI’s kind of know how benchmarks are tested and they can make some adjustment to pass test and top leaderboards. Real life scenarios are different from basic test scenarios. I see no other reason for such poor performance in reality vs test.
Also, i think these models are more like flash models and not full fledged models. Gemini pro vs flash models perform way different. Even GPT 4 vs 4o performs different . These models are cost effective not performance effective.
Yup totally agree, and it kind of worrying that they now consider GPT4 a legacy model… I always switch to GPT4 for any code related works. Gpt 4o seems to be the result of their tests a few months ago where GPT4 suddenly seemed to decrease in performance.
Gave the same script to GPT 4 “Legacy” and GPT 4o.
GPT 4o code did not work directly, GPT 4 “legacy” code worked right off the bat. (copied the exact same prompt)
Cancelled my sub and will definitely look for alternatives.
My gut feeling:
GPT4 was good, but too expensive to run, so they tried to “optimize” (aka dumb down) to save cost.
The new version of GPT-4o as of Aug 20 is ten times better now. It’s more eager to understand issues and resolve them through discussion rather than just running off producing code! Give it a try I’m impressed!
Its absolutely useless, open AI needs a good kicking for ruining it.
The funny thing is, GPT-4 was quite good, then eventually it become useless where I spent more time swearing at it then getting anything done, wasting hours and hours of my time.
Then we got GPT-4o and I thought all my problems were solved, well they were for a short time, now the same pattern occured, the only thing it has gotten better as is taking the swearing I’m giving it.
So now all I can do is revert to using legacy 4.
Somebody else has got to provde a better solution for coding, too much experimentation going on is making open AI unreliable.
It’s just rubbish, so much for using gpt to make things faster, if just ruined the code base complely, even the api is rubbish I just wasted money in api, o1 is better even if I have to use it in chat
Has anyone else been frustrated at the Canvas formatting? I like it – much better than repeatedly scrolling through answers BUT…
I’ve been trying C, Python, and JS. In all cases the window (sems like it’s not 4o directly) will screw up the code – it will split new lines IN strings (i.e. ‘\n’) into syntactically incorrect new lines. Fixing in the windows doesn’t help - 4o ignores it and regenerates crap.
Yes, I have noticed this. I’ll ask a simple question expecting a one or two line answer and get tons of irrelevant code which is missing key parts of the code anyways!
Since they roll out the 200$ monthly subscription the regular 20$ sucks, especially for coding. Can not handle simple code and i’ve been using chatgpt for years. Time to try Claude
You gotta give a detailed prompts and example code and tell it to look up the A P I and manual whatever is necessary documentation you gotta tell it to look up the Internet every single time and give it example code but it totally works perfect if you do those things…
ChatGPT is generally good for a lot of tasks but the master of none sort of situation, I think it is the safest bet for anything but Claude is my go-to for coding-related tasks, ChatGPT does have updated docs so it depends on your specific use case for most of the times.
this post is quite old and it think LLM models have moved on from GPT-4o . There are better models available now for coding. Even Chatgpt has come up few more models after this.