CHAT GPT 4 Painfully slow

Its still pretty slow, i hope they gonna fix this soon.

Same here. ChatCompletion requests that took 5-10 seconds ~2 weeks ago are now taking 30-40 seconds to complete for gpt-4 on a paid account. User experience went down the drain with such a delay.

1 Like

Why is chat gpt 4 so bad at code now? It can’t remember what I prompted. 3.5 is better.

1 Like

Has anyone had an update on GPT-4’s slow API response? Mine is getting worse and it’s a conundrum since 4 is soooo much better at what I am asking it to do.

I am using the API with a key I paid for

My response times are on the order of a minute for GPT-4. It depends on how much I give it to chew on.

Using it in completions mode/style (no context) I can get quicker responses for simple queries like: “write a limerick about a dog” take about thirty seconds

Refining it in a chat context quickly starts taking time and tokens

I am happy with it. Once I realised it was not timing out and learnt to wait.

11th of May and ChatGPT-4 has become almost unusable due to poor performance. I’m not sure I can justify $20 a month for a substandard service.

its really annoying. if I had been aware of this, i wouldn’t have upgraded. I upgraded due to the need for better performance but this is annoying!

Does anyone still experience errors with GPT4?

I just started paying for GPT 4 and it SUCKS!! Slow. Times out all the time. Doesn’t respond anything like as well or fast as the free version. Why I’m I paying for a beta version that is so full of bugs and barely works???

Here the same. Paying for it and its unusable.

Seconding all what has been said above. Actually, if there is not going to be a fix very soon for poor GPT4 generation speed I will be cancelling subscription. It’s borderline unusable for more demanding interactions that require lots of text to be generated, and GTP3.5 does it an order of magnitude faster. Is here someone from the core team to actually comment on this? Seems like paying members are talking to a wall here.

GPT-4 is an increased computation model by design. It was even slower at release and full intellectual power.

You can use your high-end consumer hardware to run an open-sourced language model at the size limit of what your computer can do - and get far lower token output per second even when it is the only thing running. Increase the cost per 2TB/640GB H100 inference server to around $250,000, and increase the cost of a rack instance of GPT4 to $1 mil that you get access to for a minute of supercomputer computation to answer about silly ASCII art.

Because it looks simple doesn’t mean it is simple.