I am using gpt-5.4-mini and priority service.
My request latency is around p50 621ms p95 1064ms p99 1504ms.
Of course the prompt put in varies, but when I look into the details of it the ones where we have spikes the inputs are almost identical to previous request.
I’ve run the analysis of the size of input vs influence of the latency and it’s only about 19%.
This made me wonder if the global routing could be the issue,
I am always in a quest to get the least and most stable latency numbers for all our requests.
I am led to believe that my request are routed globally but i don’t seem to be able to see in the openai developer platform logs which server/node/region handled the request.
While i head down this route…
- Is there anyway to understand which location handled the request to see if this is the issue
- docs suggest only added prefix
usto api endpoint will make it happen, but seemingly errors for me. (see fig 1 below)
All ideas and thoughts welcome - TIA!
Fig 1
Endpoint https://us.api.openai.com/v1/chat/completions
Model gpt-5.4-mini
Response
HTTP 401: Error code: 401 - {'detail': 'External API error: {\n "error": {\n "message": "Attempted to access resource with incorrect regional hostname. Please make your request to ``api.openai.com``",\n "type": "invalid_request_error",\n "code": "incorrect_hostname",\n "param": null\n },\n "status": 401\n}'}