I have a plus subscription and also use the api. For the plus subscription, in the chat gpt ui, i never had any issues with GPT-3.5 (only with GPT4, being slow sometimes).
But i created a simple python script to generate some short responses based on my prompts (taken from a google sheet or csv) and it’s really slow. It’s taking around 30-60 seconds per request, while in the chat ui it’s instant.
And it usually times out - if i have large file with 10-20 entries, it doesn’t get to finish it in one go, it times out or crashes in any other way (at one point it gave me a message related to cloudflare).
Is anyone having similar issues? Is there any way to fix them?
yes, indeed, it’s intermittent and probably due to high load on the servers, but it’s always very slow. i mean 30-60 second response time on the api when the chat ui is almost instant…
I also have the same problem. The service responds very quickly when it is first started, but after a period of disuse, it becomes very slow when used again.
Is it possible that the GPT server is overloaded during the day, affecting the API’s response? However, the response of the dialogue through the GTP Plus UI is very fast
I have just the same experience. My ChatGPT Pro is fast (also the legacy bot), but my paid API subscription uses aprox 40-50 seconds on 550 tokens. And it has been like this some weeks now.
We have developed simple UI connected via API with ChatGPT 3.5-turbo, and it stopped providing output couple days ago, before it was quite OK. Input queries did not change, could you assist on where to start looking for in order to fix things?
Inference times for the various models can change with time of day and week as load on the system varies and it should in general trend downwards and more compute is added, but it is important to understand this is a shared resource and there may be differences in performance over time.
However, I am consistently experiencing very slow responses for each request I am submitting (typically around 90 sec for generating some 1000 tokens). What should I do in this case? Any thought.