Stateful Responses API Much Slower Than Chat Completions

I posted this as indirectly related in another thread, but the issue seems specific enough to warrant its own thread.

On Chat Completions, GPT-5 (minimal reasoning) is averaging about 5-7 sec for me, even with 50+ message history and almost 100k tokens. On the Responses API with the same setup and number of messages/tokens, it is averaging 11 seconds and longer. I made sure it’s the same reasoning level, verbosity, etc as well.

It seems to be related to the use of previous_response_id. If I don’t use it, I can get response times back down to about 2-3 sec. However, even with just a few messages in the chain and 10k tokens use, it quickly increases the latency past 10 sec

1 Like