Seriously, any one else?
I’m using chat completions (which is usually faster than responses) and querying GPT5 with often just under the max token load (i.e. I’m not getting the kickback from openAI server saying that the prompt + response went over the context window/token limit).
So I’m querying with say, 180k tokens, at high verbosity and medium reasoning. Ocassionally I can get the same context window to work with “low” reasoning on gpt5. But often that will stall the same (and I’ve tried all verbosity settings, no change most of the time, for these “hung” conditions).
I increased my python SDK calls to the /completions endpoint to timeout after 20m. Previously I had it at 10m.
So now I wait for 20m for it to time out instead of 10m.
Note, for temp debugging, I’ve increased the timeout even longer under certain conditions, and then waited again, say, 30m, for not even a single token in response (when testing with streaming). But I don’t like to sit around at play that game, I usually just switch to a different model to push the conversation through that moment, and then switch back to GPT5 again afterwards. Which incredibly often works. Leading me to believe the “complexity of the question I’m asking” is what’s causing some kind of hang-up/reasoning loop. Because the next prompt will differ usually by only +/- 1000 tokens, otherwise identical context windows just “one message further along” in a long-running multi-turn automatic coding project flow.
This is often with relatively complex multi-faceted coding aspects where I’m sharing 20+ files in the context window, but only 1-2 previous user/assistant messages from the conversation.
Anyone else having these kinds of issues?
It seems very specific to GPT5. All the other models, whether reasoning or non-reasoning, do eventually successfully respond to the same prompt/context, often with 3-5 minutes max for other reasoning models like o3.
But GPT5 just won’t do it seemingly. Tried the same but with streaming and I don’t even get a first token after 20 minutes.
Of course since I’m not using responses API I can’t see the “reasoning” that may or may not be being generated.
But is there any kind of other known issue regarding completions endpoint with GPT5?
It seems relatively insane/unreasonable to ever wait 20 minutes, or possibly to wait infinitely for a hung server with no response??
@vb @_j you guys helped me with this before - and I increased my timeout using the SDK. But basically, still having the exact same issue, except now I wait longer for no response whatsoever from the openAI server!
@stevecoffey Saw you address something similar to this on a different post a few weeks ago, in that case you recommended some turn off the “store” in responses API so that it “acted like” chat completions endpoint.
But again my problem is,
TL;DR;
Chat completions endpoint - infinite wait times for GPT5 without any response from server or errors under large context window loads (often with medium reasoning and medium or higher verbosity). And if I increase the context window loads beyond the limit, I do get those errors.
I’ve been using the chat completions endpoint for about a year to the tune of a couple thousand bucks, so I’m not a total noob when it comes to using the SDK and the system overall. I’ve tried debugging the issue and I did update my openAI sdk packages to the latest version and re-wrote all my caller code to be compliant with the new v1+ package (was previously using an older version as _J pointed out to me a couple weeks ago).
Thanks for any thoughts or help


