Hi everyone,
I’m experiencing an intermittent issue with GPT-5/Reasoning models that I can’t quite pin down. My server is located in Vietnam, calling OpenAI via global endpoints.
The Setup:
-
TTFT Timeout: 60s (Normal) / 120s (Reasoning).
-
Global Timeout: 300s.
The Problem: Most requests work fine, but randomly (not every time), requests hit the 300s global timeout.
-
In some cases, the first token is received within 120s, but the stream then “hangs” or moves so slowly that it exceeds 300s.
-
In other cases, it just times out before the first token.
I’m trying to identify the root cause:
-
Is anyone else seeing this? Specifically those calling the API from Southeast Asia?
-
Is it the Model? Does GPT-5 occasionally “stall” mid-response or take massive reasoning pauses during a stream?
-
Is it the Network? Could intermittent packet loss on undersea cables (VN to US/EU) cause a “silent hang” where the connection stays open but data stops flowing?
-
Is it the Code? How can I verify if my local aggregator node is failing to catch an “End of Stream” signal?
Looking for suggestions: What logging or debugging strategies should I use to pinpoint exactly where the 300s is being “eaten up” when this happens?
Thanks for any insights!