GPT-5 + Responses API is extremely slow

I’m using Agents SDK with Responses API. I was testing gpt-5 before switch and noticed that it takes around 1 minute for even a basic query.

The Traces shows that:

Step Model Input Tokens Output Tokens Duration
Triage Agent 1st call gpt-5-nano ~3k ~2k ~9 seconds
Sales Agent 1st call gpt-5 ~5k ~2k ~40 seconds
Sales Agent 1st call gpt-5 ~9k ~1k ~20 seconds

The user query is “Hello”. The same system using gpt-4.1 family with the same prompt/context/user query takes only around 2-5 seconds.

I think this is related to Responses API because people using gpt-5 through Completions API seem to not having this problem.

Similar problems on other posts:

16 Likes

Is it really crazily slow? I’ve seen lots of complaints since last night

4 Likes

Just changing from gpt-4.1 family to gpt-5 family caused the reponse time of a simple “Hello” query with around ~5k system prompt to go up from ~2-5 seconds to ~1 minute.

Which user would wait ~1 minutes for “Hello”?

Can you post the links of the complaints here also? We may collect all here.

2 Likes

I think there is an infrastructure problem. If I set run in background to true and look at the status I see my requests just stuck in “queued” never switches to processing.

2 Likes

I also think it’s an infrastructure issue, can’t be this slow. Even with low effort it’s taking 5-6 minutes now

1 Like

Are you guys using Responses API or Completions API?

1 Like

I’m using responses api. Let me try completions.
PS: No it’s nearly the same, I think there are heavy issues it’s not usable this way. Positive note is that results I received with a lot of patience were really promising.

3 Likes

I’ve been getting response errors just about every response I make this morning. Sometimes refreshing 5+ times before Chat’s replies generate.

2 Likes

I am using the Responses API. Prompts with c. 4k tokens have gone from a c. 5 seconds to 30+. I had to go through all my tests lengthening timeouts to even see what it is producing. The ones I looked at did look nice, but niceness at that cost is not worthwhile. I was unable to run a full eval run due to the slow response times. Gave up, went back to 4.1 for now.

Maybe you do need to tune the extra parameters, but at the moment I would not be able to run enough evaluation to assess the quality of the ‘less thinking’ version.

1 Like

Not getting any errors, but every request to gpt-5 with basic medium will reason through a ton of tokens, but then no final output. This is with the ResponsesAPI.

I would occasionally get a response after several minutes last night, but now its producing nothing.

I got such with the open-source model.

Are you sending or missing a preset max_completions tokens that limits the output budget? It’s now for setting how much you want to pay, not how much you want to see. The finish_reason will also be “length” if the output was truncated before delivery by parameter.

There’s so many prompt tuneups to bad behavior, you could send a book of stuff for the gpt-5 model to still ignore.

1 Like

GPT-5 is unusable for me too (Responses API). Had to revert to 4.1. It’s super slow and I keep getting failures due to max tokens, even with verbosity set to low. Never had that happen before.

4 Likes

You all are leaving out information about the most important parameter when it comes to speed: “reasoning effort”

I see now it’s faster but a few hours ago it was taking around 30 seconds at “minimal” and several minutes at “low”. Didn’t try the other two. I saw in playground it was thinking very slowly so I think it was a overload issue on their side

This was super helpful - thank you.

Turned out my blank responses were because the reasoning tokens were hitting 2048 which was the apparent default output allowed. Once I bumped max_completion_tokens up to 5000 GPT-5 can’t stop talking. :slight_smile:

1 Like

I had it set to minimal and even then it was producing slow, unreliable responses.

5 is also very slow for me and breaks tool calling.

Still slow for me today.

4o and 4.1 responds near instantly, where as 5 and 5 mini take around 30-60seconds to respond.

Im just using basic set up from vercel docs.

import { openai } from '@ai-sdk/openai';
import { convertToModelMessages, streamText, UIMessage } from 'ai';
import { NextRequest } from 'next/server';

export async function POST(req: NextRequest) {
  const { messages }: { messages: UIMessage[] } = await req.json();

  const system = `PROMPT...`.trim();

  const result = streamText({
    model: openai('gpt-5'),
    system,
    messages: convertToModelMessages(messages),
  });

  return result.toUIMessageStreamResponse();
}

I won’t be able to use 5 until this is resolved.

Why was this flagged. I love ChatGPT and I really really want this to work but using it everyday it has become unbearable. It goes back and forth. Sometimes it somewhat fast and by fast I mean at most 10 seconds for a response which is already not good to waiting for 5 minutes for a response. Do you have any idea how agonizingly painful it is when you are trying to code something. Clearly something is wrong and needs to be addressed but what worries me even more is that there is no official communication telling me that there are issues, at least I haven’t seen it. Everyone from openai is operating as if everything is great.

1 Like

Look up top:

image

Everything you see makes talking about the consumer ChatGPT product off-topic.
ChatGPT not a thing that API-using developers can help you improve, except to tell you to now pick “mini” over there in the new selector, for faster start of visible output.

1 Like