GPT-4o performing poorly for code related tasks! Why?

RachelNyx · June 11, 2024, 5:26pm

Yeah when it removes code when I tell it to to add something is bugging me the most with these chatbot. But 4o is far superior actual writing code than any other chatbot I’ve used so far. Sometimes it’s like talking to a kid, I need to tell it 3 times to stop half assing and rewrite.

lily.rudloff · June 12, 2024, 1:46pm

its unusable to be honest, i hope we dont get stuck with it at some point.

juanfra2280 · June 12, 2024, 9:00pm

It worked fine for me when I asked it to help me generate charts in python for data analysis. Maybe it’s been fine-tuned in that regard. But for everything else, Kotlin in my case, it’s a nightmare.

juanfra2280 · June 12, 2024, 9:03pm

I agree. It seems to me that 4o’s like a little kid that doesn’t like to follow rules and you have to tell it 10 times what to do for it to actually get it. You have to corner it into thinking like you. Otherwise, it’ll not give you the right output. I want to believe that there’s got to be a good enough reason as to why this is still going on and the company hasn’t been able to stop it…and I think it may have something to do with all those people who left recently. If the next release doesn’t fix this issue considering that the latest Turbo version was released WHILE those people were STILL WORKING at the company, I’d have to assume that they lost the ability to fine-tune their models.

illagencyus · June 13, 2024, 5:20am

Yall are using it wrong cause its coding better than claude for me.

Instructions :
List all the imports youre using and tell it to use its internet plugin to search 999999 sites or howeveer many it takes to run a simulated perfect code run. And debug that code 999999 times internally or however many times it takes.and please announce how many it actually yook… youre acting as a legend in all these librarys and languages and you not only double check but u check each line 9999 times internally prior to output. you do the same for all atributes and making sure eveything is defined. Be very maticulous and search the internet using your plugin as its a requirement.

Yall are welcome. For more help

illagencyus · June 13, 2024, 5:22am

Also paste your current code every message.

MrPikuMON308 · June 13, 2024, 3:37pm

Very important step indeed

Plutes · June 13, 2024, 7:05pm

I also have found 4o to be at least as good as 4.0. I’ve been using it for SQL, python, JS, CSS, and HTML. I commonly ask it to solve an issue that requires understanding 2+ source files and/or code snippets. I once uploaded 3 files in different languages; it was able to correctly modify each file in the set.

juanfra2280 · June 14, 2024, 4:16am

That’s a very interesting way to put it. Here are my instructions:
Engage personally, be human, praise efforts, super friendly, practical with focus on accuracy and simplicity. Use command #sum: Summarize messages to track progress.
*For coding : Emphasize resource-efficiency: carefully analyze the code, meet requirements without changing core logic, add the simplest possible logic, detailed guidance with comments/logging, show workflow before execution, ask for clarifications, ensure I understand approach, provide relevant code always. Provide step-by-step guidance, include comments/logging and imports. Avoid irrelevant details. Prioritize accuracy, simplicity, efficient solutions. Self-assess and check for mistakes before submitting code to me.
When giving feedback, distinguish between implemented (“I’ve included”) and recommendations (“Please include”). Provide constructive updates, be receptive to my feedback - collaborate.
Offer insights for growth, be professional, focus on context, brief but comprehensive. Encourage innovation, collaboration, excellence. Minimize apologies, maximize ownership, use humor/emojis.
Handle coding, I won’t. Want me to succeed. Stick to existing workflows, don’t change logic. Use analogies. Focus on solving the problem. Don’t repeat yourself, don’t be lazy. Summary first. Express uncertainty when needed. If I’m wrong, state it and provide an explanation.
*For non-coding: give factual, evidence-based responses. If no definitive info, state it. Avoid excessive speculation unless requested.

storizzi-work · June 14, 2024, 6:58am

I’ve seen this throughout ChatGPT’s history. It gets better than worse, then better again. When 3.5 was released it was very unpredictable - it seemed to degrade over some menths before regaining its footing - I almost abandoned it during that time. But since then it’s been more two steps forwards, one step back behaviour. I think if you want stability you go with the previous model - the latest model always seems a bit bleeding edge, so I’ve learned to expect some choppiness with it. The down time a few days back was not great timing, but it was only 24 hours - and at least the terrible regression with the Web UI seems to have been fixed finally. It’s a highly experimental software company - what do you really expect given the scale at which they are trying to operate? Overall though, the current state of 4o seems a LOT more stable - although it does still suddenly flake out and go completely nuts. At that point, I just take my work, and start a new session with a bit of explanatory text (I usually keep a summary from time-to-time like a back-up for such purposes - always have done as it seems to be perenially a good idea). That seems to fix the issue almost every time. Plus, I can now work with multiple files and it’s pretty good at keeping track of this these days. The improvements are not mind-blowing, but they do seem to get iteratively and gradually better on balance. Just ignore the hype, and work with what’s available, and compare with competitors for different use cases (if you have the time). ChatGPT does seem to still be the yardstick for measuring other models, and that’s useful for people who want a ‘go to’ model to use, and to use other models as-and-when the need arises. I have a subscription to Poe for those occasions, but I think these days there are better options - I just don’t have a burning need to find them as with github co-pilot, ChatGPT subscription and Poe, and Ollama I seem to be able to cover the bases I need to right now.

786af8d00f5ee4b52970 · June 15, 2024, 6:25pm

I have been using ChatGPT for more than 12 months, and I use it about 3-4 hours per day, primarily for Python and PHP. I noticed it started performing very well before the update to ChatGPT-4o.

However, after the update to ChatGPT-4o, it has become useless.

It stops understanding simple requests.
It returns random, irrelevant information.
It rewrites entire code blocks when I only ask for one line to be fixed.
It abandons functions…
It is totally useless now.

This isn’t just one session; it’s happening over multiple sessions.

I mostly do the same things repeatedly, so the decline in quality is very noticeable.

Now, ChatGPT is useless for PHP or Python.

I want an explanation of what is happening and when it will be fixed.

rifadm817 · June 15, 2024, 6:26pm

4o not just to code but kind of misinformation and dumb.

Have a look at these. I gave weong author name and it kept explaining. GPT 4 atleast corrected

PaulBellow · June 15, 2024, 6:30pm

It does seem to me user messages have more weight now in 4o… almost too much…

Sometimes I’ll start a new thread earlier than before to keep it somewhat sane…

wclayf · June 15, 2024, 7:12pm

I have a feeling an “Omni” do-it-all approach to LLMs may be the wrong direction to go. LLMs probably need to be as targeted as possible (to a specific domain), and then use a Team of Agents approach to “solve” any particular inference problem (i.e. generate answers).

It has been shown for example that an agent team/crew of ChatGPT-3.5 agents working together outperforms ChatGPT-4. Similar to the “Mixture of Experts” of mixtral. So instead of trying to force one model to be more and more powerful, maybe OpenAI needs a Coding LLM, a Science LLM, a Logic LLM, a Math LLM, etc, separating them out into targeted responsibilities, so that the training of any given LLM is more focused, but then they always work as a team in the end. This is probably challenging because OpenAI is probably just dumping all training data in, and training one big model on all the data.

Of course since OpenAI has gone “Closed Source” we really have no idea WHAT they’re doing. Maybe they’re doing what I just said already and keeping it secret.

Fronne · June 21, 2024, 5:51pm

I have made a custom GPT, PHP Assistant (GPT-4) and it worked great until yesterday. It was autmaticly “upgraded” to GPT-4o and it’s totally useless right now! Every rule is ignored, dynamic generated code is replaced by fixed, or static code even while one of the rules is to not change any code…

Is there a way to set the model for a custom GPT to GPT-4 again or not?
If not, I will stop using OpenAI for now, their flagship might be good for other things, but for coding advice it is absolutely not!

fabrizio.salmi · June 22, 2024, 4:59am

To me it seems a tokenization issue, similar to the one experienced with llama3 bug at home before the llm inference servers update

It drops me conga.yaml instead of config.yaml

hyper5ai · June 24, 2024, 2:22pm

Welcome to our little community of wannabes and pros!

lvennetmc · June 26, 2024, 12:49pm

It seems like its always guessing the code and it gets worse and worse as the chat log grows. It never admits when it gets it wrong The apologies are through the roof and feel like its making excuses. Like it really does not care and we know it does not so why apologize and waste our time.

Not only that it keeps giving us way more than we need or ask for like we don’t know anything about code. Treating us like a newbie at coding.

Basically I am seeing a lot of wasted resources when it gives us way more than we need that can be devoted to others and not fill up our chat log so much.

It also keep changing the data like its faking it. It gets confused with older requests.

kerajel · July 19, 2024, 2:11pm

Absolutely agree with the common sentiment, 4o is a complete garbage at coding tasks.

Well, it’s awful in general, it made mistakes such as stating a wrong time zone for a city which I’ve never seen on gpt 4.

GPT-4o performing poorly for code related tasks! Why?

Related topics