Stateful Responses API Much Slower Than Chat Completions

pftq · August 26, 2025, 5:42am

I posted this as indirectly related in another thread, but the issue seems specific enough to warrant its own thread.

On Chat Completions, GPT-5 (minimal reasoning) is averaging about 5-7 sec for me, even with 50+ message history and almost 100k tokens. On the Responses API with the same setup and number of messages/tokens, it is averaging 11 seconds and longer. I made sure it’s the same reasoning level, verbosity, etc as well.

It seems to be related to the use of previous_response_id. If I don’t use it, I can get response times back down to about 2-3 sec. However, even with just a few messages in the chain and 10k tokens use, it quickly increases the latency past 10 sec

Topic		Replies	Views
Responses API... not highly responsive (& what about assistants)? API gpt-4 , responses , responses-api	2	119	August 3, 2025
OpenAI API takes too long to response API api	2	908	March 25, 2024
Chat completion stream very slow to start a reply API api , chat-completion	1	771	January 16, 2024
Responses API vs Completions: No Token Savings? API	2	207	June 22, 2025
How can I improve response times from the OpenAI API while generating responses based on our knowledge base? API chatgpt , api	3	23355	November 9, 2023

Stateful Responses API Much Slower Than Chat Completions

Related topics