GPT-4 API Gateway timeout for long requests, but billed anyway

Sounds like you have mutliple different, unrelated issues.
The 20 second wait sounds normal to me depending on your request length - the model is still generating the whole response token-by-token, so you need to wait for it to finish before you receive a response. The 3.5 turbo model in the api isn’t nearly as fast as the one in ChatGPT, in my experience.
Try out streaming, you should get a response faster. The lib you’re using has example code for streaming in its readme.

decrease GPT4 with 20usd plan and take car of API customers who pay several hundred dollars !

1 Like

I am running into the same problem. The gpt-4 api is unusable for large requests that are well under the token limit. I am running my requests from the parallel processing script they supplied in the openai cookbook, chunking my doc with tiktoken and still get timeouts and I am getting billed. I reached out to support but after reading all of these comments I am skeptical about getting a reply.

1 Like

Same issue for me. GPT4 API not usable and putting off my users :frowning:


same issue 650 api call (prompt 200 token each and completion_token 200 token each) and each in parallel are taking more than 2h… and still waiting…

the slowdown to me seems to be resolved. call with 200token 3 days ago took +30seconds now less then 2 seconds.

moreover I completely redesign my script yesterday (gave up to gpt4… and focus on gpt 3.5 turbo). definitely the api was “hanging” my script significantly so i added a lot of error handling… and back off… but to run 650 call 3 days ago took me 3 hours and now around 8-15min… so i think some of the issue to the api was resolve
(i assume being week end now there is less stress on the system)

my calls are around 6k token, very rarely working, need to make 4, 5 calls to get it working. worst is that it charges me for failed response. they should def not be charging us.

Same for me.
Failure for large request it doesn’t matter, we can start over, but please don’t charge for failures.

1 Like

Having the exact same issues, still being billed as well :frowning:


I’m just here to say that I’m dealing with the same issues like everyone else, and the lack of transparency was killing me, as a result, a friend and I have collaborated on an open-source platform to address it. Basically we addressed observability and troubleshooting features. Initially, we relied on Slack messages for error notifications, but we soon realized it wasn’t sufficient for scalability, particularly when handling more than 100 requests per day. If you’re interested, check out Pezzo on github.

This is still going on for me, almost 3 weeks after my initial post. No response from staff yet :frowning: Ouch

I rarely can get even a single request through (mostly get 524 or 502 errors), and I’m being billed for everything, which is fraudulent.

This is very worrying, I’m developing a product and I’m not at all sure I’ll be able to ship anything, given that the API essentially doesn’t work.

1 Like

I gave up on my initial idea and structured my requests differently so that they use vastly less context (but the quality of responses also decreases). A real shame that they don’t care.

And “fraudulent” is a good word. For me it’s been a couple €’s for my testing purposes, but this could have gotten expensive.

check to here

I wonder if there is a connection with the “continue” they deployed on the web version.
Indeed, I think my request is still waiting on the server for a “continue” action, but impossible to do this via the API, since I have no feedback from openAI.
I just have a timeout with an empty result and increased billing to deliver wind…

If so (I have no idea) it should return a response to the API to request a “continue” action, so that it can then deliver the full content…

I don’t think so, this error already occurred weeks before they added this.
And I’m not sure if ChatGPT and the API are comparable in terms of features. You were able to make a “continue” request in the API before that, I believe the frontend of ChatGPT just checks what the reason for ending the response was, and adds that button if it was a premature abortion.

Maybe …
But I find the behavior of the problem very similar, because on the web version, no server return allows to know if yes or no or asks the action to continue, only when clicking on the button, we can see a POST action which allows the server to continue the build.
With the API I think I have the same problem, no return from the server even with a timeout of 600

If this is the case, it would simply be necessary to make a small modification to the API, a return to the state of “break” and a POST on the conversation ID to continue the generation


You know what, you may be on to something.
Maybe it all works out if one uses the streaming api and as soon as an error occurs, fires another request with all the data that was received so far (and everything before that, too, of course).

Bump. Still hoping to get a staff response or official reply here

Does ChatGPT only utilize GPT4? I have the strange feeling that it might also have some embeddings from previos communications (or it just may have been added to training the model or my prompts have changed - it’s just a feeling). I guess we never know.
I mean GPT-4 API Answers differ from chatgpt answers (well, obviously also because the system prompt does something and maybe moderation endpoint when it is triggered it slightly changes the answer?).