When will the response time/timeout issue be addressed?

I’m having significant issues with chat completions hitting the 600 second timeout threshold with the GPT-4 API. These completions are 5,000-6,000 tokens, so well within the 8,000 context range.

These issues coincide just about perfectly with the average completion times seen here. Any time the average gets near ~40 seconds or above, I can count on consistent errors. Even with error handling that retries the completion, it will just error out again on the 2nd or 3rd attempt as well.

Browsing this forum, I can see I’m far from the only one having this issue in the past months. I get that this is a newer product, but it doesn’t seem like their server speeds are anywhere close to what they need to be. Unless my math is off, you would need to average 256 tokens roughly every 20 seconds to finish an 8k completion in 600 seconds, but the average times are never anywhere close to that.

Again, I know the GPT-4 API has only been on limited release since July, but a ~15-20% failure rate in production is beyond frustrating. Does anyone know if this is being worked on or will be in the future? I can’t find anything about it in OpenAI’s patch notes or press releases.

If it’s being worked on, its by training the AI to curtail and deny long outputs.

Use streaming and you at least will get a partial answer generated for the the five minutes before error instead of paying and getting nothing.

You can also set a max_tokens below what will cause error, and then resubmit with a new assistant message of what it wrote so far.

1 Like