I am making POST requests to the https://api.openai.com/v1/chat/completions endpoint to have it return in JSON format a list of properties found in free flow text. I am using 500-1,000 prompt tokens, 1500-2,500 completion tokens, and ~2,000 reasoning tokens per request - numbers which I see as relatively small.
Each request takes between 15 seconds to 30 seconds before a response is received back. Is this normal? I’ve tried all the different variations of the gpt-5 model (vanilla, mini, and nano) and the response times for all of them fall between 15 to 30 seconds.
For context, I have a new OpenAI account and I am a “Tier 1” customer right now. I’ve read in other posts that Azure hosted OpenAI will have faster response times, but it seems hard to believe that small requests to https://api.openai.com/v1/chat/completions with very little reasoning should consistently take this long?