Gpt-5 with reasoning set to high is timing out

Anytime i set gpt-5 reasoning.effort to high, i almost always get a timeout error.

Currently, it works perfectly fine when set the medium or below, but 95% of the time i get a timeout issue when set to high.

How am i supposed to work around this? I tried creating this via background process, but it seems that the same issue occurs with that as well. This has been ongoing for about a month now, and i posted about it hoping that it would be repaired in a month or so but the issue is still ongoing.

How are people using gpt-5 without this working? How are more people not running into this same issue?

1 Like

The AI model can run for a long time without a response or any network activity.

If you are running on a cloud platform not 100% in your control, they may be enforcing their own timeout on you. Some are as low as 60 seconds.

The default timeout of OpenAI API SDKs, 15 minutes, is also not long enough to ensure success for a difficult task in combination with high reasoning effort and the low priority and token production rate of typical requests, even without “service_tier”: “flex” for lower cost and performance.

Also, a LOT of tokens can be used in reasoning. Don’t cap the output with max_output_tokens, or the reasoning might consume the entire output budget and you never get a seen response.

How are they making calls? How about I make a call? I guess like this depiction - chat completions, your question; output tokens: 7504 (on the low end):

GPT-5 fabricated a model name and method, maybe from knowing gpt-4.1 in supervised training attempts by OpenAI, and maybe just from being a 4-bit token guesser. It has the correct method for setting a timeout on the client - but 3600 seconds is more like where you want to be. max_retries=2 is already the default, and actually better to set it to 0 and not pay for unseen errors that OpenAI masks.

There is another fault with the model or the API rate limiter - it can often just hang forever without doing anything on large context input. What should either run or should report that you are over the 272k token input, instead, just never gets back to you.

“background process” doesn’t completely tell me that you are using the “background” parameter, along with store:true. You can use that parameter to set a call in motion. Then poll and pick up your response later with the response ID. That may be the best solution if you have good parameters and inputs otherwise.

1 Like

Split tasks into smaller parts

Use medium reasoning for most steps, high only when needed

Reduce prompt size or simplify instructions

Retry requests or add small delays

Has anyone found a reliable way to use high reasoning without timeouts?