Conversation context and quadratic billing

nir.01 · March 29, 2023, 8:48am

Hi,

After having wonderful chats with gpt4 via chatgpt plus I wanted to check the API.

but I noticed billing per token is applied again and again to the entire history of the growing conversation.

This means quadratic billing in the length of the conversation.

this sound crazy and I do not comprehend how it is considered remotely usable.

I was sure there must be a context mechanism that prevents this quadratic cost (and wasted computation) but I could not find one in the docs, by asking gpt itself, or users in the discord server.

Is there such a thing?

Thanks,

Nir

paul.armstrong · March 29, 2023, 10:08am

I think you are observing that a “conversation” is actually stateless. The conversation history is not remembered but has to be repeated every time.

Making the cost grow with every request.

That is how it works, yes.

linus · March 29, 2023, 10:28am

Hi @nir.01,

as @paul.armstrong mentioned this is the case.

There are some strategies you could deploy to help you on this, for example: OpenAI API: chat completion pruning methods this is a great way to reduce tokens. Or to limit the resubmitted messages to the last 5 ones in your request.

bill.french · March 29, 2023, 1:02pm

I think there’s a fair bit of misunderstanding about OpenAI’s intentions concerning their GPT apps. I defer to @logankilpatrick, but I believe apps like ChatGPT are intended to be demonstrable examples that help everyone use LLMs and discover possibilities while envisioning comprehensive solutions.

In my view, ChatGPT is not to be regarded as a solution to any specific personal or business product.

The entire point of a comprehensive API and a community that leans into the development of OpenAI apps is to use that API to build stuff that solves problems with AGI. If you are looking for a solution that avoids the pitfalls of quadratic billing costs, you need to look to developers who know how to craft a solution that optimizes for costs using all of the tools that OpenAI has made available.

This might involve more than one of the APIs and likely other technologies like real-time databases, vector data stores, and other supporting infrastructure that would make your vision financially practical while meeting your objectives.

davide.fiocco · September 25, 2025, 4:39pm

For anyone reading this in 2025 and beyond, I salute you with https://openai.com/index/api-prompt-caching/. That’s supposed to mitigate the issue discussed here. Cheers!

Topic		Replies	Views
Retain past responses in memory without sending them again at every API request API gpt-4 , gpt-35-turbo , chatgpt	11	11559	January 25, 2024
Is possible OpenAI API caching the conversation? API	4	4471	June 4, 2024
A conversation using the API API	6	3138	December 16, 2023
Why does each new request in Realtime API get more expensive? Are tokens accumulating? API realtime , api-realtime	1	244	September 5, 2025
Efficient stateful completion chatbot API	10	5471	July 9, 2024

Conversation context and quadratic billing

Related topics