Responses API vs Completions: No Token Savings?

Mickael_Belhassen · June 22, 2025, 1:54pm

I’m comparing the Completions API with the new Responses API.

With Completions, I send the full conversation history in each request. With Responses, I only reference the previous message using previous_response_id, so I expected lower input token usage.

But in practice, I’m seeing almost the same number of input tokens on both sides.

Does the Responses API still internally process the full history? I thought it would manage context more efficiently and reduce cost.

Any insights?

aprendendo.next · June 22, 2025, 2:07pm

Exactly, it is still the same input context.

Just a bit more convenient.

OnceAndTwice · June 22, 2025, 9:37pm

The benefit with previous_response_id is that you don’t have to store your own context. All you need is the ID. It still stands that whenever the context is being changed, the entire thing has to be ran through the LLM all over again and you’re billed for all input tokens in each iteration.

If your context is over 1000 tokens, you may benefit from input cache savings, but this is regardless of whether or not you use the new Responses API and happens entirely automatically. Other methods of reducing cost include downgrading to a cheaper model, distillation (into another OpenAI model), optimizing your prompts and output format, and using the Batch API or flex processing.

Topic		Replies	Views
Can Instructions be reused at no cost? Or, how to save on tokens API	4	3380	January 1, 2024
Retain past responses in memory without sending them again at every API request API gpt-4 , gpt-35-turbo , chatgpt	11	11383	January 25, 2024
Reducing costs from the previous context and system instructions when using chat completions api API api	3	405	October 5, 2024
Efficient stateful completion chatbot API	10	5373	July 9, 2024
Memory for Chat Completions API	7	1239	July 29, 2024

Responses API vs Completions: No Token Savings?

Related topics