Yeah, newer accounts aren’t allowed to post links to reduce spam.

If you share the organization/repository part of the URL people will be able to find it.

Got it, I have put some results and instructions online to github at emrgnt-cmplxty/automata-v0.

I am happy to discuss these results further. I have seen similar commentary in the issues at evalplus/evalplus.

Thanks.

Hi and welcome to the developer forum!

If you could share the chat link to that along with the source file you uploaded I’d be happy to work with you to resolve the issue you were facing.

1 Like

I don’t see automata-v0 just automata, is that the right repo?

apologies, I just made the repository public.

Automata is not a good environment to try to replicate this result in since it is somewhat dynamic, the v0 creates a more static environment to study this.

How will this help? Examining the issue case by case seems like trying to empty the ocean with a spoon – it’s not a problem with the particular chat or code file. Why can’t you escalate these numerous issues from the community so that openai product managers who insist nothing is wrong may finally begin the believe us so they can actually solve the underlying issues?

3 Likes

I’ve wasted an entire day having gpt4 lose context from literally the prior response each time. What is going on?

2 Likes

Can you post a snippet of the code that makes the API call and also the prompt text sent to the model for the the problematic call, also the output produced by the model for that prompt?

I suspect there is a token count issue, i.e. you are going over the 8k token limit for the entirety of the prompt and reply or there is a prompting issue that we can take a look at.

The case by case nature of prompt design and iterative development testing is due to the fact yoiu are not dealing with a fully deterministic instruction->action system. The models are linguistic and statistical in nature and require that you narrow down the size of the possible solution space in order to reduce the potential for error, this is done with linguistic programming and some trail and error as we are the pioneers of this technology, there are no text books we can reference to know best practices and prior methodologies.

While I believe my AI “wrangling” skills have improved vastly over the past few months, I confess that I still understand little about how they are created and trained.

I recently watched this video with interest: https://youtu.be/abilGf5hWyE?si=QAlZgqbsepvgGALw

And I thought: Could OpenAI be in the process of “de-training” (if that’s even a thing) it’s models in anticipation of possible lawsuits? And if so, could that be the effect so many people are reporting here in this topic? I mean, not knowing a thing about the back-end of these models, I’ve got to imagine that trying to increase their capacity to handle the hundreds of thousands of users still trying to get access, at the same time removing whole swaths of core data here and there, would have to have a detrimental affect for overall usage.

Just a thought.

2 Likes

GPT-4 is now completely unusable for me and I am not quite sure what OpenAI’s strategy is. Have they provided a reasoning anywhere? I haven’t seen them saying a word about the issue publicly, but the performance numbers and just the quality of responses make it incredibly obvious.

The difference is so stark I genuinely feel insulted by their response. Hanlon’s Razor just doesn’t cut it for me anymore.

To quote Peter Welinder on the 13th of exactly one month after the release that seemed to have broken GPT-4:

No, we haven’t made GPT-4 dumber. Quite the opposite: we make each new version smarter than the previous one.

Current hypothesis: When you use it more heavily, you start noticing issues you didn’t see before.

I don’t want to call Peter out here personally, but I do think the moment OpenAI made promises as to the performance and offered the AI for use in customer products they have, if not the legal, at least the moral obligation to try to fulfill this promises. That is what doing business in good faith is about.

Shortly after that quote researchers at the Stanford University published a paper (is linked in this thread) with the conclusion that GPT-4 has seen very severe drops in performance since the june update. It was worlds ahead of GPT-3.5 in every single task before, now it is almost an exact 50/50 in the number of wins GPT-4 takes and GPT-3.5 takes (although, overall GPT-4 is still clearly better).


I must say, it is very hard for me to trust in OpenAI with my projects. They are not a non-profit anymore and have seen incredible revenue growth and public support. It was probably one of the most beloved companies in the first half of this year, which is worth everything when selling to consumers, and they managed to ruin that with many customers by first bringing a product that revolutionized how tens of millions work and then making it utterly unusable, only to state that it is normal variance between tasks.

Many obvious and easy solution were discussed in this thread, but they just do not seem to care. The data is clear. I personally have countless prompts that make the drop incredibly obvious, there is no possible way they are not aware of it in their data, yet they choose to stay silent.


Not sure what to think guys. I tried out the March version of GPT-4 from their API but I have never used it before, so no reference point. Did it also see this drop in performance or could I replace GPT-4 with the API version until GPT-5 or a different LLM arrives?

1 Like

Sounds like ChatGPT isn’t for you and you will be best served by going your own way and doing your own thing.

Good luck out there!

The code interpreter is constantly outputting files that are invalid / not even complete. It’s forgetting context in the middle of the output during the response to the uploaded file. It’s not seeing obvious things and refusing to change them if I ask. It’s very odd because this was better a few days ago? Seems strange, it’s constantly having context reset. It doesn’t always seem to be Healthy.

I have a hypothesis. Are they throttling the power and having some way of reducing the intelligence/GPU power / abilities as we use it more to limit usage instead of the prompt limiting previously? Is this a new way to keep resource usage low, and hence we get varying degrees of power and intelligence depending on how much we do at a time?

This feels like it may be true. I notice the prompt I have that does a lot of extra iterating doesn’t activate after I have worked on something a long time, or a lot during that day. It feels like it can get squeezed down to where chatGPT is just very “dumb” literally on coding vs. how it was an hour before.

Hi,

Can you please show what is in the “show work” tab?


one of these looks interesting at least. this seems to be what it keeps doing, it’s doing it every other try, or within a prompt multiple times often.

it just seems to be “stuck” and not able to even think through to get any solutions tonight.

1 Like

Since the August 3rd update it’s worse than ever. It’s become worse than GPT-3. 5 when it was first released.

I canceled my subscription to Plus today. No reason to pay for a tool that has become more tedious than slogging through StackOverflow.

It seems the costs of running the original version were simply too much for the company to bear, and they are trying to cut corners to save money.

Of course, they will find that paying customers like me aren’t going to shell out hundreds of euros to OpenAI just to get the occasional amusing limerick.

If GPT-4 can’t keep up with coding demands and make it faster to find information than Google, it has no more worth to me than some babble-bot like Tay or Replika. I have plenty of other outlets for amusement or low-intensity cognitive tasks. I needed GPT-4 for tough stuff that it is no longer capable of doing.

3 Likes

honestly chatgpt 4 is borderline unusable at this point… it is so unbelievably dumb… what a shame! what happened!!! It literally doesn’t understand a single word 99% of the time now, ive prompted and reprompted and it is just not getting like it once was…

3 Likes

I think it is being throttled, and sometimes you get enough GPU to match what it used to do months ago. It is odd how bad it is being in the last few days though, stuck in logic loops without context enough to view the code I upload or paste for more than 1 prompt after. Keeps giving a solution it proved out as wrong, suddenly forgets what we were working on. Lots of “need to upload the file again” right in the prompt I uploaded it in :/. This is terrible.

Yet in the last few days it did at times suddenly output an entire correct code refactor with fixes that were as before, excellent. It feels as if you can do it, yet it takes a lot of patience and focus. If I keep hammering at it about a problem, finally there is a window of time the jackpot is output. Code is good and boom problem solved like previously. Those moments are getting fewer and fewer lately. Keeping context up to them is difficult, and wasteful of time. I am sure this overall wastes more GPU for OpenAI by making us have to push it and hence a lot of extra cycles instead of just doing it and being done.

That’s… just not how any of this works. You can’t “throttle” the model or “use” less GPU resources.

I am not sure what it is, I’m just trying to figure it out. What would reduce the context and allotted “power” of chatGPT?

It’s literally the context gets small and the amount of “cycles” it will give me per question reduces and increases depending on peak time of day and other unknown factors. I see this behavior, I am a heavy user so am “feeling it” do this. I am not imagining it, somehow it is being “reduced” at times in “power”.

1 Like

I’ve experienced exactly what you’re talking about. It’s so odd… there comes moments where the true AI Genius that was GPT 4 from a few months ago shines through…and other times its a joke… and i’ve experienced it apologising A LOT, A LOT!!!

2 Likes