Debugging High TTFT and 300s Request Timeouts for GPT-5 / Reasoning Models (Server in SE Asia)

Hi everyone,

I’m experiencing an intermittent issue with GPT-5/Reasoning models that I can’t quite pin down. My server is located in Vietnam, calling OpenAI via global endpoints.

The Setup:

  • TTFT Timeout: 60s (Normal) / 120s (Reasoning).

  • Global Timeout: 300s.

The Problem: Most requests work fine, but randomly (not every time), requests hit the 300s global timeout.

  • In some cases, the first token is received within 120s, but the stream then “hangs” or moves so slowly that it exceeds 300s.

  • In other cases, it just times out before the first token.

I’m trying to identify the root cause:

  1. Is anyone else seeing this? Specifically those calling the API from Southeast Asia?

  2. Is it the Model? Does GPT-5 occasionally “stall” mid-response or take massive reasoning pauses during a stream?

  3. Is it the Network? Could intermittent packet loss on undersea cables (VN to US/EU) cause a “silent hang” where the connection stays open but data stops flowing?

  4. Is it the Code? How can I verify if my local aggregator node is failing to catch an “End of Stream” signal?

Looking for suggestions: What logging or debugging strategies should I use to pinpoint exactly where the 300s is being “eaten up” when this happens?

Thanks for any insights!