GPT-4 API Gateway timeout for long requests, but billed anyway

Yes, I am absolutely sure. I haven’t had a single succesful request today or yesterday, yet there’s around 1.50€ in my billing overview. I haven’t made any other requests on the api aside from one test request with like 15 tokens total. I updated the original post with the response I’m getting.

1 Like

I defaulted to 3.5. GPT-4 was unusable today.

I would love to, but my request content is too long :expressionless:

I’ve experienced the same issue (gateway timeouts, charged anyway), and as usual received 0 response from the support team on it. Super frustrating (especially since I’d had queued jobs set up the first time, and didn’t realize I was getting charged for the non-completes).

Side-note - has anybody ever received an answer via chat? The only time I’ve had somebody respond to me was when I emailed re: an Acceptable Use Policy question, but I’ve never had a chat response from support.

2 Likes

Same problem here. GPT-4 API is non-functional for longer requests, and getting billed for them.

Error: Gateway timeout. {"error":{"code":524,"message":"Gateway timeout.","param":null,"type":"cf_gateway_timeout"}} 524 {'error': {'code': 524, 'message': 'Gateway timeout.', 'param': None, 'type': 'cf_gateway_timeout'}} {'Date': 'Thu, 04 May 2023 09:37:15 GMT', 'Content-Type': 'application/json', 'Content-Length': '92', 'Connection': 'keep-alive', 'X-Frame-Options': 'SAMEORIGIN', 'Referrer-Policy': 'same-origin', 'Cache-Control': 'private, max-age=0, no-store, no-cache, must-revalidate, post-check=0, pre-check=0', 'Expires': 'Thu, 01 Jan 1970 00:00:01 GMT', 'Server': 'cloudflare', 'CF-RAY': '7c1fb4efd836b951-AMS'}
4 Likes

Same problem today. Also get billed for each failed attempt.

3 Likes

The issue persists as of today for me.
And I can’t even publish this message because apparently it is incomplete.

1 Like

Same issue here. We are getting billed for requests that never arrive. Plus, we have to shorten the text in order to get less timeouts, which increases the overall prompt length, if that makes sense.

I am running into the same exact issue. My requests are 6.5k~ tokens at the moment, and I send them in a batch of 6 requests, with a delay of 2-15 seconds per request, and a 60 second delay after the batch is done before starting the next batch.

Pretty frustrating because I don’t think I’m doing anything wrong… anyone figure out any working solutions to this without dramatically decreasing my request lengths?

I am also getting this error for some of my GPT4 API requests that are ~3500 tokens or more. I tried increasing the request_timeout to 40 minutes but the request still sits there for 40 minutes and then times out.

I tried streaming my response to see where it breaks.

Input: 3,551 (tokens)
Ouput: 2,155 (tokens)

I tried the same request multiple times and it seems to crash around the same part of the output every time.

As a temporary workaround, I ask Chat GPT to continue and feed it the previous message + output. (I tested this in Playground, will see if it works with the APIs). I’ll have to pay for an extra 6K input tokens every call, but at least my app will be working.

Update: This method worked as a temporary workaround.

I have it stream the response so I can see when/where it crashes instead of it timing out. Then I make a new request with the same prompt and the response I received so far:
[{“role”: “user”, “content”: prompt}]
[{“role”: “assistant”, “content”: first_chunk_of_response}]

And it picks up where it left off.

This is also happening to my larger API requests when experimenting with GPT-4. When I use the prompt in the playground it seems to work fine. As some people here mentioned it, I checked the usage data and can also confirm that requests that are not fulfilled are billed! This could have gotten expensive if i would have let my queue workers running that are setup to retry api requests…

Has anyone else contacted support over this? It’s been 14 days without a reply.

1 Like

I did, but only yesterday. I’ll post here if I get a response.

2 Likes

No response yet. Replying to bump the thread for OpenAI staff to hopefully see.

I have been having the same problem. Unfortunately it makes a new app I’m prototyping useless :frowning: For me, it is only happening with large context (~6k tokens) requests.

21 days after the first post here I still have the same problem. For big content (500 pages - 100.000 words) it is a major problem. 1. The costs to charge the customers are too high and unpredictable if errors coming up like this. 2. if you get timeout or something went wrong, you cant start from the beginning — for now I have no clue if this issue kills my business model or I can find a smart solution.

1 Like

I just really wonder how they want to serve the 32k model if even 4k+ tokens cause intermittent timeouts. Maybe that’s why the roll-out is so slow.

No reply from the support yet, by the way.

they serve low hanging fruits first. And maybe not many people have that use case

I’m experiencing a similar issue where I receive a timeout error if the process exceeds 600 seconds. I’ve adjusted my request to be processed in smaller chunks, which works. However, selecting chunk size is guessing game that takes 10 mins per try and im out $4. At least dont charge us for errors, ratelimits, overloaded, and timeouts.

I don’t even have access to gpt4, but on average i have to wait around 20 seconds for replies. Also I only receive replies when I use default GetResponseFromChatbotAsync method (from unnoficial C# lib I’m using). When I specify endpoint details like model, Nr of tokens, temp etc, i usually get timeout errors even when I specify the fastest models, and small nr of tokens. The chatbotasync method I mentioned never times out, but sometimes I have to wait like 30 sec. Will have to investigate this further to figure out what’s different between these requests and why some fail, and some not even after a longer period of time. My guess is it’s default timeout period used in different methods or different endpoints because I’m 100% not exceeding the token cap per unit of time.